HOME

TheInfoList



OR:

HTTrack is a free and
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
Web crawler A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spi ...
and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3. HTTrack allows users to download
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web se ...
sites from the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a ''internetworking, network of networks'' that consists ...
to a local computer. By default, HTTrack arranges the downloaded site by the original site's relative link-structure. The downloaded (or " mirrored") website can be browsed by opening a page of the site in a browser. HTTrack can also update an existing mirrored site and resume interrupted downloads. HTTrack is configurable by options and by filters (include/exclude), and has an integrated help system. There is a basic command line version and two GUI versions (WinHTTrack and WebHTTrack); the former can be part of scripts and cron jobs. HTTrack uses a
Web crawler A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spi ...
to download a website. Some parts of the website may not be downloaded by default due to the
robots exclusion protocol The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the site they are allowed to visit. Th ...
unless disabled during the program. HTTrack can follow links that are generated with basic
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
and inside
Applet In computing, an applet is any small application that performs one specific task that runs within the scope of a dedicated widget engine or a larger program, often as a plug-in. The term is frequently used to refer to a Java applet, a program w ...
s or
Flash Flash, flashes, or FLASH may refer to: Arts, entertainment, and media Fictional aliases * Flash (DC Comics character), several DC Comics superheroes with super speed: ** Flash (Barry Allen) ** Flash (Jay Garrick) ** Wally West, the first Kid F ...
, but not complex links (generated using functions or expressions) or server-side
image map In HTML and XHTML, an image map is a list of coordinates relating to a specific image, created in order to hyperlink areas of the image to different destinations (as opposed to a normal image link, in which the entire area of the image links to a ...
s.


See also

*
Robots Exclusion Standard The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the site they are allowed to visit. Th ...
* Website mirroring software


References


External links

* * Free web crawlers Web archiving Portable software Free software programmed in C Web scraping {{web-software-stub