HOME

TheInfoList



OR:

HtmlUnit is a
headless web browser A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using netwo ...
written in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. It allows high-level manipulation of
website A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google Search, Google, Facebook, Amaz ...
s from other Java code, including filling and submitting
form Form is the shape, visual appearance, or configuration of an object. In a wider sense, the form is the way something happens. Form also refers to: *Form (document), a document (printed or electronic) with spaces in which to write or enter data ...
s and clicking
hyperlink In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text wit ...
s. It also provides access to the structure and the details within received web pages. HtmlUnit emulates parts of browser behaviour including the lower-level aspects of
TCP/IP The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the set of communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suit ...
and
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, ...
. A sequence such as getPage(url), getLinkWith("Click here"), click() allows a user to navigate through
hypertext Hypertext is E-text, text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typi ...
and obtain web pages that include
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
,
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
,
Ajax Ajax may refer to: Greek mythology and tragedy * Ajax the Great, a Greek mythological hero, son of King Telamon and Periboea * Ajax the Lesser, a Greek mythological hero, son of Oileus, the king of Locris * ''Ajax'' (play), by the ancient Greek ...
and
cookies A cookie is a baked or cooked snack or dessert that is typically small, flat and sweet. It usually contains flour, sugar, egg, and some type of oil, fat, or butter. It may include other ingredients such as raisins, oats, chocolate chips, nuts ...
. This headless browser can deal with HTTPS security, basic HTTP authentication, automatic page redirection and other HTTP headers. It allows Java test code to examine returned pages either as text, an XML DOM, or as collections of forms, tables, and links. The goal is to simulate real browsers; namely Chrome,
Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current and ...
ESR 38,
Internet Explorer Internet Explorer (formerly Microsoft Internet Explorer and Windows Internet Explorer, commonly abbreviated IE or MSIE) is a series of graphical user interface, graphical web browsers developed by Microsoft which was used in the Microsoft Wind ...
8 and 11, and
Edge Edge or EDGE may refer to: Technology Computing * Edge computing, a network load-balancing system * Edge device, an entry point to a computer network * Adobe Edge, a graphical development application * Microsoft Edge, a web browser developed by ...
(experimental). The most common use of HtmlUnit is
test automation In software testing, test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive bu ...
of web pages, but sometimes it can be used for
web scraping Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping ...
, or downloading website content.


Benefits

* Provides high-level API, taking away lower-level details away from the user. * Compared to other WebDriver implementations, HtmlUnitDriver is the fastest to implement. * It can be configured to simulate specific Browser.


Drawbacks

* Elements layout and rendering can not be tested. * The JavaScript support is still not complete. And this is usually the main area of enhancements.


Used technologies

*
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
DOM Dom or DOM may refer to: People and fictional characters * Dom (given name), including fictional characters * Dom (surname) * Dom La Nena (born 1989), stage name of Brazilian-born cellist, singer and songwriter Dominique Pinto * Dom people, an et ...
*
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, ...
connection, using
Apache The Apache () are a group of culturally related Native American tribes in the Southwestern United States, which include the Chiricahua, Jicarilla, Lipan, Mescalero, Mimbreño, Ndendahe (Bedonkohe or Mogollon and Nednhi or Carrizaleño an ...
br>HttpComponents
*
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
, using forked
Rhino A rhinoceros (; ; ), commonly abbreviated to rhino, is a member of any of the five extant species (or numerous extinct species) of odd-toed ungulates in the family Rhinocerotidae. (It can also refer to a member of any of the extinct species o ...
*
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
Parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...

NekoHTML
*
CSS Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a document written in a markup language such as HTML or XML (including XML dialects such as SVG, MathML or XHTML). CSS is a cornerstone techno ...
: usin
CSS Parser
*
XPath XPath (XML Path Language) is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values (e.g., strings, numbers, or Boolean v ...
support, using
Xalan Xalan is a popular open source software library from the Apache Software Foundation, that implements the XSLT 1.0 XML transformation language and the XPath 1.0 language. The Xalan XSLT processor is available for both the Java and C++ programming ...


Libraries using HtmlUnit

* Selenium WebDriver
Spring MVC Test Framework
*
Google Web Toolkit Google Web Toolkit (GWT ), or GWT Web Toolkit, is an open-source set of tools that allows web developers to create and maintain JavaScript front-end applications in Java. It is licensed under the Apache License 2.0. GWT emphasizes reusable ...
br>tests

WebTest

Wetator


See also

*
Headless system A headless computer is a computer system or device that has been configured to operate without a monitor (the missing "head"), keyboard, and mouse. A headless system is typically controlled over a network connection, although some headless system ...
*
PhantomJS PhantomJS is a discontinued headless browser used for automating web page interaction. PhantomJS provides a JavaScript API enabling automated navigation, screenshots, user behavior and assertions making it a common tool used to run browser-based ...
a headless WebKit with JavaScript API *
Web scraping Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping ...
*
Web testing Web testing is software testing that focuses on web applications. Complete testing of a web-based system before going live can help address issues before the system is revealed to the public. Issues may include the security of the web application ...
*
xUnit xUnit is the collective name for several unit testing frameworks that derive their structure and functionality from Smalltalk's SUnit. ''SUnit'', designed by Kent Beck in 1998, was written in a highly structured object-oriented style, which len ...
* River Trail * Selenium WebDriver


References


Bibliography

* pp. 339–


Further reading

* Langer, A
"Extending HtmlUnit for test automatisation of Web applications using AJAX"
Betreuer/in (nen): T. Grechenig, M. Bernhart; 183/1, 2009. * * * * *


External links


HtmlUnit
{{web browsers Unit testing frameworks Java platform Load testing tools Web development Free software programmed in Java (programming language) Web browsers Web scraping