Change detection and notification
   HOME

TheInfoList



OR:

Change detection and notification (CDN) is the automatic detection of changes made to
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web ...
pages and notification to interested users by email or other means. Whereas
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
s are designed to find web pages, CDN systems are designed to monitor changes to web pages. Before change detection and notification, it was necessary for users to manually check for web page changes, either by revisiting web sites or periodically searching again. Efficient and effective change detection and notification is hampered by the fact that most servers do not accurately track content changes through Last-Modified or ETag web-server headers. A comprehensive analysis regarding CDN systems can be found
here Here is an adverb that means "in, on, or at this place". It may also refer to: Software * Here Technologies, a mapping company * Here WeGo (formerly Here Maps), a mobile app and map website by Here Television * Here TV (formerly "here!"), a ...
.


History

In 1996, NetMind developed the first change detection and notification tool, known as Mind-it, which ran for six years. This spawned new services such as ChangeDetection (1999), ChangeDetect (2002),
Google Alerts Google Alerts is a content change detection and notification service, offered by Google. The service sends emails to the user when it finds new results—such as web pages, newspaper articles, blogs, or scientific research—that match the user' ...
(2003), and Versionista (2007) which was used by the
John McCain 2008 presidential campaign The 2008 presidential campaign of John McCain, the longtime senior U.S. Senator from Arizona, was launched with an informal announcement on February 28, 2007, during a live taping of the ''Late Show with David Letterman'', and formally laun ...
in the race for the
2008 United States presidential election The 2008 United States presidential election was the 56th quadrennial presidential election, held on Tuesday, November 4, 2008. The Democratic ticket of Barack Obama, the junior senator from Illinois, and Joe Biden, the senior senator fr ...
. Historically, change polling has been done either by a server which sent email notifications or a desktop program which audibly alerted the user to a change. Change alerting is also possible directly to mobile devices and through
push notification Push technology or server push is a style of Internet-based communication where the request for a given transaction is initiated by the publisher or central server. It is contrasted with pull/get, where the request for the transmission of informat ...
s,
webhooks A webhook in web development is a method of augmenting or altering the behavior of a web page or web application with custom callbacks. These callbacks may be maintained, modified, and managed by third-party users and developers who may not nece ...
and HTTP callbacks for application integration. Monitoring options vary by service or product and range from monitoring a single web page at a time to entire web sites. What is actually monitored also varies by service or product with the possibilities of monitoring text, links, documents, scripts, images or screen shots. With the notable exception of Google's patent filings related to
Google Alerts Google Alerts is a content change detection and notification service, offered by Google. The service sends emails to the user when it finds new results—such as web pages, newspaper articles, blogs, or scientific research—that match the user' ...
,
intellectual property Intellectual property (IP) is a category of property that includes intangible creations of the human intellect. There are many types of intellectual property, and some countries recognize more than others. The best-known types are patents, co ...
activity by change detection and notification vendors is minimal. No one vendor has successfully leveraged exclusive rights to change detection and notification technology through patents or other legal means. This has resulted in significant functional overlap between products and services.


Architectural approaches

Change detection and notification services can be categorized by the
software architecture Software architecture is the fundamental structure of a software system and the discipline of creating such structures and systems. Each structure comprises software elements, relations among them, and properties of both elements and relations. ...
they use. Three principal approaches can be distinguished:


Server based

A server polls content, tracks changes and logs data, sending alerts in the form of email notifications, webhooks, RSS. Typically, an associated website with a configuration is managed by the user. Some services also have a mobile device application which connects to a cloud server and provides alerts to the mobile device.


Self-hosted based

A relatively newer approach, which lays between Server based and Client based is to use Self-hosting, where the software which would normally run on a separate server runs on your own hardware locally, generally means that the software provides a miniature
web server A web server is computer software and underlying hardware that accepts requests via HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initia ...
with a browser interface instead of a classic
graphical user interface The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, ins ...
provided by an application.


Client based

A local client application with a
graphical user interface The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, ins ...
polls content, tracks changes and logs data. Client applications can be browser extensions, mobile apps or programs.


Considerations

Some web pages change regularly, due to the inclusion of adverts or feeds in the presented page. This can trigger false-positives in the change-detection, since users are often only interested in changes to the main content. Some approaches to mitigate this issue exist. * Create a metric of difference between two versions of a page (calculated for example from change in total size, changes in HTML file, or changes in the DOM
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
) and ignore changes below some threshold. The threshold may be set by the user, or estimated automatically by comparing some early versions of the page. * Content extraction. For popular sites, or sites running popular software, content may be actively separated from chaff by selecting a sub-tree of the DOM, for example using
XPath XPath (XML Path Language) is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values (e.g., strings, numbers, or Boolean v ...
. Another typical method is the use of
regular expressions A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" o ...
to extract only the text the user is interested in.


References

* * {{cite journal , last1= Shobhna, first1=Bansal , last2=Chadhaury , first2=Manoj , date=June 2013 , title=A Survey on Web Page Change Detection System Using Different Approaches , url=http://www.ijcsmc.com/docs/papers/June2013/V2I6201391.pdf , journal=International Journal of Computer Science and Mobile Computing , publisher=IJCSMC , volume= 2 , issue=6 , pages=294–299 , issn=2320-088X , access-date=8 September 2016
changedetection.io
Self-hosted website change detection and notification