Dark data
   HOME

TheInfoList



OR:

Dark data is
data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
which is acquired through various
computer network operations Computer network operations (CNO) is a broad term that has both military and civilian application. Conventional wisdom is that information is power, and more and more of the information necessary to make decisions is digitized and conveyed over an e ...
but not used in any manner to derive insights or for decision making. The ability of an organisation to collect data can exceed the
throughput Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ove ...
at which it can analyse the data. In some cases the organisation may not even be aware that the data is being collected. IBM estimate that roughly 90 percent of data generated by sensors and analog-to-digital conversions never get used. In an industrial context, dark data can include information gathered by sensors and
telematic Telematics is an interdisciplinary field encompassing telecommunications, vehicular technologies (road transport, road safety, etc.), electrical engineering (sensors, instrumentation, wireless communications, etc.), and computer science (multimedia ...
s. Organizations retain dark data for a multitude of reasons, and it is estimated that most companies are only analyzing 1% of their data. Often it is stored for regulatory compliance and record keeping. Some organizations believe that dark data could be useful to them in the future, once they have acquired better analytic and
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical p ...
technology to process the information. Because storage is inexpensive, storing data is easy. However, storing and securing the data usually entails greater expenses (or even risk) than the potential return profit. Professor David Hand of
Imperial College London Imperial College London (legally Imperial College of Science, Technology and Medicine) is a public research university in London, United Kingdom. Its history began with Prince Albert, consort of Queen Victoria, who developed his vision for a cu ...
also uses the term to refer to data that are missing: ''Dark data are data you don't have.''


Analysis

The term "dark data" very often refers to data that is not amenable to computer processing. For example, a company might have a great deal of data that exists only as scanned page-images. Even the bare text in such documents is not available without something like
Optical character recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a sc ...
, which can vary greatly in accuracy. Even with OCR, the significance of each part of the data is unavailable. An obvious examples is whether a capitalized word is a name or not, and if so, whether it represents a person, place, organization, or even a work of art. Bibliographic and other references, data within tables (that may be labeled quite adequately for humans, but not for processing), and countless assertions represented with the full complexity and ambiguity of human language. A lot of unused data is very valuable, and would be used if it ''could'' be; but is blocked because it is in formats that are difficult to process, categorise, identify, and analyse. Often the reason that business does not use their dark data is because of the amount of resources it would take and the difficulty of having that data analysed. In other words, the data is "dark" not because it is not used, but because it cannot (feasibly or affordably) be used, given its poor representation. There are many data representations that can make data much more accessible for automation. However, a great deal of information lacks any such identification of information items or relationships; and much more loses it during "downhill" conversion such as saving to page-oriented representations, printing, scanning, or faxing. The journey back "uphill" can be costly. According to
Computer Weekly ''Computer Weekly'' is a digital magazine and website for IT professionals in the United Kingdom. It was formerly published as a weekly print magazine by Reed Business Information for over 45 years. Topics covered within the magazine include outs ...
, 60% of organisations believe that their own
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical p ...
reporting capability is "inadequate" and 65% say that they have "somewhat disorganised content management approaches".


Relevance

Useful data may become dark data after it becomes irrelevant, as it is not processed fast enough. This is called "perishable insights" in "live flowing data". For example, if the geolocation of a customer is known to a business, the business can make offer based on the location, however if this data is not processed immediately, it may be irrelevant in the future. According to IBM, about 60 percent of data loses its value immediately.


Storage

According to the
New York Times ''The New York Times'' (''the Times'', ''NYT'', or the Gray Lady) is a daily newspaper based in New York City with a worldwide readership reported in 2020 to comprise a declining 840,000 paid print subscribers, and a growing 6 million paid ...
, 90% of energy used by data centres is wasted. If data was not stored, energy costs could be saved. Furthermore, there are costs associated with the underutilisation of information and thus missed opportunities. According to Datamation, "the storage environments of EMEA organizations consist of 54 percent dark data, 32 percent redundant, obsolete and trivial data and 14 percent business-critical data. By 2020, this can add up to $891 billion in storage and management costs that can otherwise be avoided." The continuous storage of dark data can put an organisation at risk, especially if this data is sensitive. In the case of a breach, this can result in serious repercussions. These can be financial, legal and can seriously hurt an organisation's reputation. For example, a breach of private records of customers could result in the stealing of sensitive information, which could result in
identity theft Identity theft occurs when someone uses another person's personal identifying information, like their name, identifying number, or credit card number, without their permission, to commit fraud or other crimes. The term ''identity theft'' was c ...
. Another example could be the breach of the company's own sensitive information, for example relating to research and development. These risks can be mitigated by assessing and auditing whether this data is useful to the organisation, employing strong encryption and security and finally, if it is determined to be discarded, then it should be discarded in a way that it becomes unretrievable.


Future

It is generally considered that as more advanced computing systems for analysis of data are built, the higher the value of dark data will be. It has been noted that "data and analytics will be the foundation of the modern industrial revolution". Of course, this includes data that is currently considered "dark data" since there are not enough resources to process it. All this data that is being collected can be used in the future to bring maximum productivity and an ability for organisations to meet consumers' demand. Technology advancements are helping to leverage this dark data affordably, thanks to young and innovative companies such as Datumize, Veritas or Lucidworks. Furthermore, many organisations do not realise the value of dark data right now, for example in healthcare and education organisations deal with large amounts of data that could create a significant "potential to service students and patients in the manner in which the consumer and financial services pursue their target population".


References

{{Statistics Data analysis Data collection Databases Computer data storage