Zatocoding
   HOME

TheInfoList



OR:

A superimposed code such as Zatocoding is a kind of
hash code A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
that was popular in marginal punched-card systems.


Marginal punched-card systems

Many names, some of them trademarked, have been used for marginal punched-card systems: edge-notched cards, slotted cards, E-Z Sort, Zatocards, McBee, McBee Keysort, Flexisort, Velom, Rocket, etc. The center of each card held the relevant information—typically the name and author of a book, research paper, or journal article on a nearby shelf; and a list of subjects and keywords. Some sets of cards contained all the information required by the user on the card itself, handwritten, typewritten, or on microfilm (
aperture card An aperture card is a type of punched card with a cut-out window into which a chip of microfilm is mounted. Such a card is used for archiving or for making multiple inexpensive copies of a document for ease of distribution. The card is typically ...
). Every card in a stack had the same set of pre-punched holes. The user would find the particular cards relevant to a search by aligning the holes in the set of cards (using a card holder or card tray), inserting one or more knitting-needle-like rods all the way through the stack, so the desired cards (which had been notched or cut open) fell out from the irrelevant cards in the collection (left un-notched), which remain on the needle(s). A user could repeat this selection many times to form a complex
Boolean searching In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variables are the truth values ''true'' and ''false'', usually denoted 1 and 0, whereas in e ...
query. A card that was relevant to 2 or more subjects would have the slot(s) for each of those subjects cut out, so that card would drop out when either one or the other or both subjects was selected . The "superimposed code" coding systems, such as Zatocoding, saved space by entering several or all subjects in the same field; such a "superimposed code" stores much more information in less space, but at the cost of occasional "false" selections. Once you have a collection of index cards, one per book, research paper, or journal article in a library, with a list of keywords (subjects) discussed in a particular book written on that book's card, the "obvious way" to code those subjects is to count up the total number of subjects used in the entire collection R, make a row of R holes near the top of every card, and for each subject actually discussed in a particular book, cut a slot from the hole corresponding to that subject in the card corresponding to that book. W. Ross Ashby
W. Ross Ashby's Journal: Zato-coding
1960 Sep. 22. p. 6208-6222
Naturally, this also requires a separate list of every subject used in the collection that indicates which hole is punched for each subject. Unfortunately, there may be thousands of distinct subjects in the collection, and it is impractical to punch thousands of holes in every card. While it may not seem possible to use less than 1 hole per subject, superimposed code systems can solve this problem.


Superimposed codes

The Zatocoding system of information retrieval was developed by
Calvin Mooers Calvin Northrup Mooers (October 24, 1919 – December 1, 1994), was an American computer scientist known for his work in information retrieval and for the programming language TRAC. Early life Mooers was a native of Minneapolis, Minnesota, atte ...
in 1947. Calvin Mooers invented Zatocoding at M.I.T., a mechanical information retrieval system based on superimposed codes, and formed the
Zator Company Calvin Northrup Mooers (October 24, 1919 – December 1, 1994), was an American computer scientist known for his work in information retrieval and for the programming language TRAC. Early life Mooers was a native of Minneapolis, Minnesota, atte ...
in 1947 to commercialize its applications. The particular superimposed code used in that system is called Zatocoding, while the marginal-punched card information retrieval system as a whole is called "Zator".
Herbert Marvin Ohlman Herbert Marvin Ohlman (1927–2002) is the inventor of permutation indexing, or Permuterm and is one of the pioneers of Information Science and Technology. He has been recognized and included in thPioneers of Information Science in North Ame ...

"Subject-Word Letter Frequencies with Applications to Superimposed Coding"
Proceedings of the International Conference on Scientific Information (1959).
Setting up a superimposed code for a particular library goes something like this: * Going through every card in the index, a list of all R subjects used in this particular library is created, and the maximum number of subjects r actually written on a single card is noted. (For example, say we have 8000 subjects, and the librarian decides to index only the top r=4 subjects per book). * The librarian looks at the physical
edge-notched card Edge-notched cards or edge-punched cards are a system used to store a small amount of binary or logical data on paper index cards, encoded via the presence or absence of notches in the edges of the cards. The notches allowed efficient sorting and s ...
, and notes the number of holes N in each card. (If N >= R, then we could use the "obvious way" mentioned above—the whole point of Zatocoding is that it works even when N is much less than R). * The librarian chooses some number n of slots per subject—typically n = N( 1- 2^ ) * On the list of all R subjects, for each subject write down which holes will be slotted for that subject. Rather than slotting one hole per subject in "the obvious way", a superimposed code will slot n holes per subject. (There are several ways to pick these patterns—those distinguish between the various superimposed codes; we discuss them below). * When a new book comes in, make a new card for it: ** Get a blank card with the standard N holes in it and write down the name of the book, etc. in the middle. ** Write down the subjects covered by the book on the card. ** For each of the top r subjects, look up that subject in the big list, and see which n slots to cut for that subject, and cut them. ** When the card is finished, it may have up to r*n slots cut into it—but more likely at least some of the subject slot patterns overlapped, resulting in only v < r*n slots. Later, when we need to find books on some particular subject, we look up that subject in our list of all R subjects, find the corresponding slot pattern of n slots, and put n needles are through the whole stack in that pattern. All of the cards that have been cut with that pattern will fall out. It is possible that a few other, undesired cards may also fall out—cards who have several subjects whose hole patterns overlap in such a way as to mimic the desired pattern. The probability F of some undesired card with v slots cut in it falling through when we select some pattern of n needles is approximately F = \left(\frac\right)^n. Most systems have a N large enough and r small enough such that, v < N/2 (i.e., the card is less than half-punched), so that probability of an undesired card falling through is less than F < \left(\frac\right)^n. There are several different ways to choose which holes will be slotted for each subject. (Several variations of Zatocoding were developed. Bourne describes a variant "for newer retrieval systems that require high performance of the superimposed coding system", using an approach Mooers published in 1959.)


Zatocoding

Setting up a Zatocode for a particular list of R subjects goes something like this: * For the first subject, pick n of the N slots randomly. * For the second subject, pick n of the N slots randomly—but make sure this pattern is not identical to the first subject. * ... * For the R'th subject, pick n of the N slots randomly—but make sure it's not identical to any previous subject.


Other superimposed codes

A Zatocode requires a code book that lists every subject and a randomly generated notch code associated with each one. Other "direct" superimposed codes have a fixed hash function for transforming the letters in (one spelling of) a subject into a notch code. Such codes require a much shorter code book that describes the translation of letters in a word to the corresponding notch code, and can in principle easily add new subjects without changing the code book. A Bloom filter can be considered a kind of superimposed code. James Blustein; and Amal El-Maazawi
"Bloom Filters - A Tutorial, Analysis, and Survey"
p. 11.


References

{{reflist


External links

* Calvin N. Mooers
"Application of random codes to the gathering of statistical information"
Thesis (M.S.) Massachusetts Institute of Technology. Dept. of Mathematics, 1948. * Calvin N. Mooers
"Zatocoding applied to mechanical organization of knowledge"
Journal of the American Society for Information Science and Technology. 2007. Storage media