HOME

TheInfoList



OR:

In
cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
, a message authentication code based on
universal hashing In mathematics and computing, universal hashing (in a randomized algorithm or data structure) refers to selecting a hash function at random from a family of hash functions with a certain mathematical property (see definition below). This guarantees ...
, or UMAC, is a type of
message authentication code In cryptography, a message authentication code (MAC), sometimes known as a ''tag'', is a short piece of information used for authenticating a message. In other words, to confirm that the message came from the stated sender (its authenticity) and ...
(MAC) calculated choosing a hash function from a class of hash functions according to some secret (random) process and applying it to the message. The resulting digest or fingerprint is then encrypted to hide the identity of the hash function used. As with any MAC, it may be used to simultaneously verify both the ''
data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire Information Lifecycle Management, life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, proc ...
'' and the ''authenticity'' of a
message A message is a discrete unit of communication intended by the source for consumption by some recipient or group of recipients. A message may be delivered by various means, including courier, telegraphy, carrier pigeon and electronic bus. A ...
. A specific type of UMAC, also commonly referred to just UMAC, is specified in RFC 4418, it has provable cryptographic strength and is usually a lot less computationally intensive than other MACs. UMAC's design is optimized for 32-bit architectures with
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
support, with a performance of 1 CPU cycle per byte (cpb) with SIMD and 2 cpb without SIMD. A closely related variant of UMAC that is optimized for 64-bit architectures is given by
VMAC VMAC is a block cipher-based message authentication code (MAC) algorithm using a universal hash proposed by Ted Krovetz and Wei Dai in April 2007. The algorithm was designed for high performance backed by a formal analysis. VMAC is designed to h ...
, which has been submitted to the IETF as a draft () but never gathered enough attention for becoming a standardized RFC.


Background


Universal hashing

Let's say the hash function is chosen from a class of hash functions H, which maps messages into D, the set of possible message digests. This class is called
universal Universal is the adjective for universe. Universal may also refer to: Companies * NBCUniversal, a media and entertainment company ** Universal Animation Studios, an American Animation studio, and a subsidiary of NBCUniversal ** Universal TV, a ...
if, for any distinct pair of messages, there are at most , H, /, D, functions that map them to the same member of D. This means that if an attacker wants to replace one message with another and, from his point of view, the hash function was chosen completely randomly, the probability that the UMAC will not detect his modification is at most 1/, D, . But this definition is not strong enough — if the possible messages are 0 and 1, D= and H consists of the identity operation and ''not'', H is universal. But even if the digest is encrypted by modular addition, the attacker can change the message and the digest at the same time and the receiver wouldn't know the difference.


Strongly universal hashing

A class of hash functions H that is good to use will make it difficult for an attacker to guess the correct digest ''d'' of a fake message ''f'' after intercepting one message ''a'' with digest ''c''. In other words, :\Pr_ h(a)=c, needs to be very small, preferably 1/, ''D'', . It is easy to construct a class of hash functions when ''D'' is
field Field may refer to: Expanses of open ground * Field (agriculture), an area of land used for agricultural purposes * Airfield, an aerodrome that lacks the infrastructure of an airport * Battlefield * Lawn, an area of mowed grass * Meadow, a grass ...
. For example, if , ''D'', is
prime A prime number (or a prime) is a natural number greater than 1 that is not a product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number. For example, 5 is prime because the only ways ...
, all the operations are taken
modulo In computing, the modulo operation returns the remainder or signed remainder of a division, after one number is divided by another (called the '' modulus'' of the operation). Given two positive numbers and , modulo (often abbreviated as ) is t ...
, ''D'', . The message ''a'' is then encoded as an ''n''-dimensional vector over ''D'' (''a''1, ''a''2, ..., ''a''''n''). ''H'' then has , ''D'', ''n''+1 members, each corresponding to an (''n'' + 1)-dimensional vector over ''D'' (''h''0, ''h''1, ..., ''h''''n''). If we let : h(a)=h_0+\sum_^n h_ia_i\, we can use the rules of probabilities and combinatorics to prove that :\Pr_ h(a)=c If we properly encrypt all the digests (e.g. with a
one-time pad In cryptography, the one-time pad (OTP) is an encryption technique that cannot be cracked, but requires the use of a single-use pre-shared key that is not smaller than the message being sent. In this technique, a plaintext is paired with a ran ...
), an attacker cannot learn anything from them and the same hash function can be used for all communication between the two parties. This may not be true for ECB encryption because it may be quite likely that two messages produce the same hash value. Then some kind of
initialization vector In cryptography, an initialization vector (IV) or starting variable (SV) is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to ...
should be used, which is often called the nonce. It has become common practice to set ''h''0 = ''f''(nonce), where ''f'' is also secret. Notice that having massive amounts of computer power does not help the attacker at all. If the recipient limits the amount of forgeries it accepts (by sleeping whenever it detects one), , ''D'', can be 232 or smaller.


Example

The following C function generates a 24 bit UMAC. It assumes that secret is a multiple of 24 bits, msg is not longer than secret and result already contains the 24 secret bits e.g. f(nonce). nonce does not need to be contained in msg. /* DUBIOUS: This does not seem to have anything to do with the (likely long) RFC * definition. This is probably an example for the general UMAC concept. * Who the heck from 2007 (Nroets) chooses 3 bytes in an example? * * We gotta move this along with a better definition of str. uni. hash into * uni. hash. */ #define uchar uint8_t void UHash24 (uchar *msg, uchar *secret, size_t len, uchar *result) #define uchar uint8_t #define swap32(x) ((x) & 0xff) << 24 , ((x) & 0xff00) << 8 , ((x) & 0xff0000) >> 8 , (x) & 0xff000000) >> 24) /* This is the same thing, but grouped up (generating better assembly and stuff). It is still bad and nobody has explained why it's strongly universal. */ void UHash24Ex (uchar *msg, uchar *secret, size_t len, uchar *result)


NH and the RFC UMAC


NH

Functions in the above unnamed strongly universal hash-function family uses ''n'' multiplies to compute a hash value. The NH family halves the number of multiplications, which roughly translates to a two-fold speed-up in practice., section 5.3 For speed, UMAC uses the NH hash-function family. NH is specifically designed to use
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
instructions, and hence UMAC is the first MAC function optimized for SIMD. The following hash family is 2^-universal:, Equation 1 and also section 4.2 "Definition of NH". : \operatorname_(M) = \left( \sum_^ ((m_ + k_) \bmod ~ 2^w ) \cdot ((m_ + k_) \bmod ~ 2^w ) \right) \bmod ~ 2^ . where * The message M is encoded as an ''n''-dimensional vector of ''w''-bit words (''m''0, ''m''1, ''m''2, ..., ''m''''n-1''). * The intermediate key K is encoded as an ''n+1''-dimensional vector of ''w''-bit words (''k''0, ''k''1, ''k''2, ..., ''k''''n''). A
pseudorandom generator In theoretical computer science and cryptography, a pseudorandom generator (PRG) for a class of statistical tests is a deterministic procedure that maps a random seed to a longer pseudorandom string such that no statistical test in the class ca ...
generates K from a shared secret key. Practically, NH is done in unsigned integers. All multiplications are mod 2^''w'', all additions mod 2^''w''/2, and all inputs as are a vector of half-words (w/2 = 32-bit integers). The algorithm will then use \lceil k/2 \rceil multiplications, where k was the number of half-words in the vector. Thus, the algorithm runs at a "rate" of one multiplication per word of input.


RFC 4418

RFC 4418 does a lot to wrap NH to make it a good UMAC. The overall UHASH ("Universal Hash Function") routine produces a variable length of tags, which corresponds to the number of iterations (and the total lengths of keys) needed in all three layers of its hashing. Several calls to an AES-based
key derivation function In cryptography, a key derivation function (KDF) is a cryptographic algorithm that derives one or more secret keys from a secret value such as a master key, a password, or a passphrase using a pseudorandom function (which typically uses a crypto ...
is used to provide keys for all three keyed hashes. * Layer 1 (1024 byte chunks -> 8 byte hashes concatenated) uses NH because it is fast. * Layer 2 hashes everything down to 16 bytes using a POLY function that performs prime modulus arithmetics, with the prime changing as the size of the input grows. * Layer 3 hashes the 16-byte string to a fixed length of 4 bytes. This is what one iteration generates. In RFC 4418, NH is rearranged to take a form of: Y = 0 for (i = 0; i < t; i += 8) do Y = Y +_64 ((M_ +_32 K_) *_64 (M_ +_32 K_)) Y = Y +_64 ((M_ +_32 K_) *_64 (M_ +_32 K_)) Y = Y +_64 ((M_ +_32 K_) *_64 (M_ +_32 K_)) Y = Y +_64 ((M_ +_32 K_) *_64 (M_ +_32 K_)) end for This definition is designed to encourage programmers to use SIMD instructions on the accumulation, since only data with four indices away are likely to not be put in the same SIMD register, and hence faster to multiply in bulk. On a hypothetical machine, it could simply translate to: movq $0, regY ; Y = 0 movq $0, regI ; i = 0 loop: add     reg1, regM, regI ; reg1 = M + i add     reg2, regM, regI vldr.4x32 vec1, reg1 ; load 4x32bit vals from memory *reg1 to vec1 vldr.4x32 vec2, reg2 vmul.4x64   vec3, vec1, vec2 ; vec3 = vec1 * vec2 uaddv.4x64 reg3, vec3 ; horizontally sum vec3 into reg3 add regY, regY, reg3 ; regY = regY + reg3 add regI, regI, $8 cmp regI, regT jlt loop


See also

*
Poly1305 Poly1305 is a universal hash family designed by Daniel J. Bernstein for use in cryptography. As with any universal hash family, Poly1305 can be used as a one-time message authentication code to authenticate a single message using a key shared ...
is another fast MAC based on strongly universal hashing.


References


External links

* * {{Cryptography navbox , hash Message authentication codes