Headless Browser
   HOME

TheInfoList



OR:

A headless browser is a
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
without a
graphical user interface The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inste ...
. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a
command-line interface A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
or using network communication. They are particularly useful for
testing An examination (exam or evaluation) or test is an educational assessment intended to measure a test-taker's knowledge, skill, aptitude, physical fitness, or classification in many other topics (e.g., beliefs). A test may be administered verba ...
web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, colour, font selection and execution of
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
and
Ajax Ajax may refer to: Greek mythology and tragedy * Ajax the Great, a Greek mythological hero, son of King Telamon and Periboea * Ajax the Lesser, a Greek mythological hero, son of Oileus, the king of Locris * ''Ajax'' (play), by the ancient Greek ...
which are usually not available when using other testing methods. Since version 59 of
Google Chrome Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS ...
and version 56 of
Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current and ...
, there is native support for remote control of the browser. This made earlier efforts obsolete, notably
PhantomJS PhantomJS is a discontinued headless browser used for automating web page interaction. PhantomJS provides a JavaScript API enabling automated navigation, screenshots, user behavior and assertions making it a common tool used to run browser-based ...
.


Use cases

The main use cases for headless browsers are: *
Test automation In software testing, test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive bu ...
in modern
web application A web application (or web app) is application software that is accessed using a web browser. Web applications are delivered on the World Wide Web to users with an active network connection. History In earlier computing models like client-serve ...
s (
web testing Web testing is software testing that focuses on web applications. Complete testing of a web-based system before going live can help address issues before the system is revealed to the public. Issues may include the security of the web application ...
) * Taking screenshots of web pages. * Running automated tests for JavaScript libraries. * Automating interaction of web pages.


Other uses

Headless browsers are also useful for
web scraping Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping ...
.
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
stated in 2009 that using a headless browser could help their search engine index content from websites that use Ajax. Headless browsers have also been misused in various ways: * Perform
DDoS In computing, a denial-of-service attack (DoS attack) is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host A ...
attacks on web sites. * Increase advertisement impressions. * Automate web sites in unintended ways e.g. for
credential stuffing Credential stuffing is a type of cyberattack in which the attacker collects stolen account credentials, typically consisting of lists of usernames and/or email addresses and the corresponding passwords (often from a data breach), and then uses t ...
. However, a study of browser traffic in 2018 found no preference by malicious actors for headless browsers. There is no indication that headless browsers are used more frequently than non-headless browsers for malicious purposes, like DDoS attacks, SQL injections or
cross-site scripting Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability may ...
attacks


Usage

As several major browsers natively support headless mode through
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standa ...
s, some software exists to perform browser automation through a unified interface. These include: * Selenium WebDriver - a
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
compliant implementation of WebDriver * Playwright - a
Node.js Node.js is an open-source server environment. Node.js is cross-platform and runs on Windows, Linux, Unix, and macOS. Node.js is a back-end JavaScript runtime environment. Node.js runs on the V8 JavaScript Engine and executes JavaScript code o ...
library to automate Chromium, Firefox and WebKit * Puppeteer - a
Node.js Node.js is an open-source server environment. Node.js is cross-platform and runs on Windows, Linux, Unix, and macOS. Node.js is a back-end JavaScript runtime environment. Node.js runs on the V8 JavaScript Engine and executes JavaScript code o ...
library to automate Chrome


Test Automation

Some test automation software and frameworks include headless browsers as part of their testing apparati. *
Capybara The capybaraAlso called capivara (in Brazil), capiguara (in Bolivia), chigüire, chigüiro, or fercho (in Colombia and Venezuela), carpincho (in Argentina, Paraguay and Uruguay) and ronsoco (in Peru). or greater capybara (''Hydrochoerus hydro ...
uses headless browsing, either via
WebKit WebKit is a browser engine developed by Apple and primarily used in its Safari web browser, as well as on the iOS and iPadOS version of any web browser. WebKit is also used by the BlackBerry Browser, PlayStation consoles beginning from the PS ...
or Headless Chrome to mimic user behavior in its testing protocols. *
Jasmine Jasmine ( taxonomic name: ''Jasminum''; , ) is a genus of shrubs and vines in the olive family (Oleaceae). It contains around 200 species native to tropical and warm temperate regions of Eurasia, Africa, and Oceania. Jasmines are widely cultiva ...
uses Selenium by default, but can use WebKit or Headless Chrome, to run browser tests.


Alternatives

Another approach is to use software that provides browser APIs. For example, Deno provides browser APIs as part of its design. For
Node.js Node.js is an open-source server environment. Node.js is cross-platform and runs on Windows, Linux, Unix, and macOS. Node.js is a back-end JavaScript runtime environment. Node.js runs on the V8 JavaScript Engine and executes JavaScript code o ...
, jsdom is the most complete provider. While most are able to support common browser features (HTML parsing,
cookies A cookie is a baked or cooked snack or dessert that is typically small, flat and sweet. It usually contains flour, sugar, egg, and some type of oil, fat, or butter. It may include other ingredients such as raisins, oats, chocolate chips, nuts ...
, XHR, some JavaScript, etc.), they do not render the
DOM Dom or DOM may refer to: People and fictional characters * Dom (given name), including fictional characters * Dom (surname) * Dom La Nena (born 1989), stage name of Brazilian-born cellist, singer and songwriter Dominique Pinto * Dom people, an et ...
and have limited support for DOM events. They usually perform faster than full browsers, but are unable to correctly interpret many popular websites. Another is
HtmlUnit HtmlUnit is a headless web browser written in Java. It allows high-level manipulation of websites from other Java code, including filling and submitting forms and clicking hyperlinks. It also provides access to the structure and the details wit ...
, a headless browser written in Java. HtmlUnit uses the Rhino engine to provide JavaScript and Ajax support as well as partial rendering capability.


List of headless browsers

These are various software that provide headless browser APIs. * Splash is a headless web browser written in
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
using the
WebKit WebKit is a browser engine developed by Apple and primarily used in its Safari web browser, as well as on the iOS and iPadOS version of any web browser. WebKit is also used by the BlackBerry Browser, PlayStation consoles beginning from the PS ...
layout engine via Qt. It has an HTTP API,
Lua Lua or LUA may refer to: Science and technology * Lua (programming language) * Latvia University of Agriculture * Last universal ancestor, in evolution Ethnicity and language * Lua people, of Laos * Lawa people, of Thailand sometimes referred t ...
scripting support and a built-in
IPython IPython (Interactive Python) is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and histo ...
(Jupyter)-based IDE. Development started at ScrapingHub in 2013; it is partially funded by
DARPA The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adv ...
. * Zombie.js is a simulated browser environment for
Node.js Node.js is an open-source server environment. Node.js is cross-platform and runs on Windows, Linux, Unix, and macOS. Node.js is a back-end JavaScript runtime environment. Node.js runs on the V8 JavaScript Engine and executes JavaScript code o ...
. * SimpleBrowser is a headless web browser written in C# supporting .NET Standard 2.0 *
DotNetBrowser DotNetBrowser is a proprietary .NET library that provides a Chromium-based engine which can be used to load and display web pages. It is developed and supported by TeamDev since 2015. Features Some main features are as follows: * Load and displa ...
is a proprietary .NET Chromium-based library that provides the off-screen rendering mode and can be used without embedding or displaying windows. Another noted earlier effort was envjs in 2008 from
John Resig John Resig is an American software engineer and entrepreneur, best known as the creator and lead developer of the jQuery JavaScript library. , he works as the chief software architect at Khan Academy. History Resig graduated with an undergraduate ...
, which was a simulated browser environment written in JavaScript for the Rhino engine.


See also

*
Headless computer A headless computer is a computer system or device that has been configured to operate without a monitor (the missing "head"), keyboard, and mouse. A headless system is typically controlled over a network connection, although some headless system ...


References

{{reflist, 30em Web browsers