Bot Prevention
   HOME

TheInfoList



OR:

Bot prevention refers to the methods used by web services to prevent access by automated processes.


Types of bots

Studies suggest that over half of the traffic on the internet is bot activity, of which over half is further classified as 'bad bots'. Bots are used for various purposes online. Some bots are used passively for
web scraping Web scraping, web harvesting, or web data extraction is data scraping used for data extraction, extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. W ...
purposes, for example, to gather information from
airlines An airline is a company that provides air transport services for traveling passengers or freight (cargo). Airlines use aircraft to supply these services and may form partnerships or alliances with other airlines for codeshare agreements, in ...
about flight prices and destinations. Other bots, such as
sneaker Sneakers (American English, US) or trainers (British English, UK), also known by a #Names, wide variety of other names, are shoes primarily designed for sports or other forms of physical exercise, but are also widely used for everyday casual ...
bots, help the bot operator acquire high-demand luxury goods; sometimes these are resold on the secondary market at higher prices, in what is commonly known as 'scalping'.


Detection techniques and avoidance

Various
fingerprinting A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surfa ...
and behavioural techniques are used to identify whether the
client Client(s) or The Client may refer to: * Client (business) * Client (computing), hardware or software that accesses a remote service on another computer * Customer or client, a recipient of goods or services in return for monetary or other valuable ...
is a human user or a bot. In turn, bots use a range of techniques to avoid detection and appear like a human to the server. Browser fingerprinting techniques are the most common component in anti-bot protection systems. Data is usually collected through client-side
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
which is then transmitted to the anti-bot service for analysis. The data collected includes results from JavaScript APIs (checking if a given API is implemented and returns the results expected from a normal browser), rendering complex
WebGL WebGL (short for Web Graphics Library) is a JavaScript Application programming interface, API for rendering interactive 2D and 3D graphics within any compatible web browser without the use of plug-in (computing), plug-ins. WebGL is fully integra ...
scenes, and using the Canvas API. TLS fingerprinting techniques categorise the client by analysing the supported
cipher suites A cipher suite is a set of algorithms that help secure a network connection. Suites typically use Transport Layer Security (TLS) or its deprecated predecessor Secure Socket Layer (SSL). The set of algorithms that cipher suites usually contain incl ...
during the SSL handshake. These fingerprints can be used to create whitelists/ blacklists containing fingerprints of known browser stacks. In 2017,
Salesforce Salesforce, Inc. is an American cloud-based software company headquartered in San Francisco, California. It provides applications focused on sales, customer service, marketing automation, e-commerce, analytics, artificial intelligence, and ap ...
open sourced Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open ...
its TLS fingerprinting library (JA3). Between August and September 2018, Akamai noticed a large increase in TLS tampering across its network to evade detection. Behaviour-based techniques are also utilised, although less commonly than fingerprinting techniques, and rely on the idea that bots behave differently to human visitors. A common behavioural approach is to analyse a client's mouse movements and determine if they are typical of a human. More traditional techniques such as
CAPTCHA Completely Automated Public Turing Test to tell Computers and Humans Apart (CAPTCHA) ( ) is a type of challenge–response authentication, challenge–response turing test used in computing to determine whether the user is human in order to de ...
s are also often employed, however they are generally considered ineffective while simultaneously obtrusive to human visitors. The use of JavaScript can prevent some bots that rely on basic requests (such as via
cURL cURL (pronounced like "curl", ) is a free and open source computer program for transferring data to and from Internet servers. It can download a URL from a web server over HTTP, and supports a variety of other network protocols, URI scheme ...
), as these will not load the detection script and hence will fail to progress. A common method to bypass many techniques is to use a headless browser to simulate a real
web browser A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scr ...
and execute the client-side JavaScript detection scripts. There are a variety of headless browsers that are used; some are custom (such as
PhantomJS PhantomJS is a discontinued headless browser used for automating web page interaction. PhantomJS provides a JavaScript API enabling automated navigation, screenshots, user behavior and assertions making it a common tool used to run browser-based ...
) but it is also possible to operate typical browsers such as
Google Chrome Google Chrome is a web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, iPadOS, an ...
in headless mode using a driver.
Selenium Selenium is a chemical element; it has symbol (chemistry), symbol Se and atomic number 34. It has various physical appearances, including a brick-red powder, a vitreous black solid, and a grey metallic-looking form. It seldom occurs in this elem ...
is a common web automation framework that makes it easier to control the headless browser. Anti-bot detection systems attempt to identify the implementation of methods specific to these headless browsers, or the lack of proper implementation of APIs that would be implemented in regular web browsers. The source code of these JavaScript files is typically
obfuscated Obfuscation is the obscuring of the intended meaning of communication by making the message difficult to understand, usually with confusing and ambiguous language. The obfuscation might be either unintentional or intentional (although intent u ...
to make it harder to
reverse engineer Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accompl ...
how the detection works. Common techniques include: * Minification * String arrays *
Control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an '' ...
flattening *
Dead code The term dead code has multiple definitions. Some use the term to refer to code (i.e. instructions in memory) which can never be executed at run-time. In some areas of computer programming, dead code is a section in the source code of a program whi ...
injection * debugger statements, to prevent use of
debugger A debugger is a computer program used to test and debug other programs (the "target" programs). Common features of debuggers include the ability to run or halt the target program using breakpoints, step through code line by line, and display ...
s like DevTools Anti-bot protection services are offered by various internet companies, such as
Cloudflare Cloudflare, Inc., is an American company that provides content delivery network services, cybersecurity, DDoS mitigation, wide area network services, reverse proxies, Domain Name Service, ICANN-accredited domain registration, and other se ...
and Akamai.


Law

In the United States, the
Better Online Tickets Sales Act The Better Online Ticket Sales Act of 2016Pub.L. 114-274, S.3183 commonly referred to as the BOTS Act) was signed into federal law by President Barack Obama on December 14, 2016. This act was created to thwart attempts by individuals and organizat ...
(commonly known as the BOTS Act) was passed in 2016 to prevent some uses of bots in commerce. A year later, the United Kingdom passed similar regulations in the Digital Economy Act 2017. The effectiveness of these measures is disputed.


References

{{reflist Internet security Internet fraud