Project Naptha is a browser extension

software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...

for

Google Chrome Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS ...

that allows users to

highlight Highlight or Highlights may refer to: In arts and entertainment * Highlight (band), a South Korean boy group ** ''Highlight'' (album), the third album by the group, or its title track"Highlight", *Highlight, a song by South Korean-Japanese girl ...

copy Copy may refer to: *Copying or the product of copying (including the plural "copies"); the duplication of information or an artifact **Cut, copy and paste, a method of reproducing text or other data in computing **File copying **Photocopying, a pr ...

, edit and

translate Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transl ...

text from within images. It was created by developer Kevin Kwok, and released in April 2014 as a Chrome add-on. This software was first made available only on Google Chrome, downloadable from the Chrome Web Store. It was then made available on

Mozilla Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current and a ...

, downloadable from the Mozilla Firefox add-ons

repository Repository may refer to: Archives and online databases * Content repository, a database with an associated set of data management tools, allowing application-independent access to the content * Disciplinary repository (or subject repository), an ...

but was soon removed. The reason behind the removal remains unknown. The

web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...

extension uses advanced imaging technology. Similar technologies have also been employed to produce hardcopy art, and the

identification Identification or identify may refer to: *Identity document, any document used to verify a person's identity Arts, entertainment and media * ''Identify'' (album) by Got7, 2014 * "Identify" (song), by Natalie Imbruglia, 1999 * Identification ( ...

of these works. By adopting several

Optical Character Recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scen ...

(OCR)

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...

s, including libraries developed by

Microsoft Research Microsoft Research (MSR) is the research subsidiary of Microsoft. It was created in 1991 by Richard Rashid, Bill Gates and Nathan Myhrvold with the intent to advance state-of-the-art computing and solve difficult world problems through technologi ...

and

Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...

, text is automatically identified in images. The OCR enables the build-up of a model of text regions, words and letters from all images. The OCR technology that Project Naptha adopts is a slightly differentiated technology in comparison to the technology used by software such as Google Drive and Microsoft OneNote to facilitate and analyse text within images. Project Naptha also makes use of a method called ''Stroke Width Transform (SWT)'', developed by Microsoft Research in 2008 as a form of text detection.

Origin of name

The name Naptha is derived from

Naphtha Naphtha ( or ) is a flammable liquid hydrocarbon mixture. Mixtures labelled ''naphtha'' have been produced from natural gas condensates, petroleum distillates, and the distillation of coal tar and peat. In different industries and regions ''n ...

, which is a general term that originated few thousand years ago and refers to flammable liquid hydrocarbon. The process of highlighting texts also inspired the naming of the project.

Difficulty in translation of words from images

The process of editing, copying or quoting text inside images was difficult before software such as Project Naptha arrived. Previously, the only way to search or copy a sentence from an image was to manually transcribe the text.

History

In May 2012, Kevin Kwok was reading about

seam carving Seam carving (or liquid rescaling) is an algorithm for content-aware image resizing, developed by Shai Avidan, of Mitsubishi Electric Research Laboratories (MERL), and Ariel Shamir, of the Interdisciplinary Center and MERL. It functions by es ...

, an

which was able to rescale images without distorting or damaging the quality of the image. Kwok noticed that they tend to converge and arrange themselves in a way that cut through the spaces in between letters. A particularly

verbose Verbosity or verboseness is speech or writing that uses more words than necessary. The opposite of verbosity is plain language. Some teachers, including the author of ''The Elements of Style'', warn against verbosity; similarly Mark Twain and E ...

comic inspired him to develop a

which can read images (with

canvas Canvas is an extremely durable plain-woven fabric used for making sails, tents, marquees, backpacks, shelters, as a support for oil painting and for other items for which sturdiness is required, as well as in such fashion objects as handbags ...

), figure the positions of the lines and letters, and draw selection overlays to assuage a

pervasive Pervasive may refer to: *Pervasive Computing, human computer interaction paradigm * Pervasive Informatics, study of how information affects human interactions *Pervasive Software, software company in the United States **Pervasive PSQL, software d ...

text-selection habit. Kwok's first attempt was simple. He projected the image onto the side and a vertical pixel

image histogram An image histogram is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image. It plots the number of pixels for each tonal value. By looking at the histogram for a specific image a viewer will be ...

was formed. The significant valleys of the resulting histograms served as a signature for the ends of text lines. When horizontal lines are detected, each lines are automatically cropped, and the histogram process repeats itself until all horizontal lines in the image have been identified. In order to determine the letter position, a similar process was carried out, but vertically this time. However, carrying out the process vertically was unsuccessful as projections created were not readable. It was less effective, proving that the process was strictly applicable only for horizontal machine printed text. Faced with high technical difficulties, Kwok decided to abandon this project in 2012. It was only until Kevin Kwok went on to study at

Massachusetts Institute of Technology The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the ...

(MIT) and entered a

hackathon A hackathon (also known as a hack day, hackfest, datathon or codefest; a portmanteau of hacking and marathon) is an event where people engage in rapid and collaborative engineering over a relatively short period of time such as 24 or 48 hours. Th ...

, that he picked up this project again. This project eventually won him second place. To him, selecting texts in pictures was something that was manageable on a technical level. The relevant technology exists and was readily available for quite some time, yet for inexplicable reason, it hadn't been expanded for the application of translating texts from images. Once Kevin Kwok decided to start on his project again, the technology for

transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...

translation Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...

, text erasure, and modification flowed naturally afterwards.

Technical Features

Before the

(OCR) can be applied, it has to first identify whether blocks of text exists in an image. Once the blocks of texts are identified, the OCR enables for the build-up of a model of text regions, words and letters from any images. This function provides users with the option to

and even

modify Modification may refer to: * Modifications of school work for students with special educational needs * Modifications (genetics), changes in appearance arising from changes in the environment * Posttranslational modifications, changes to protein ...

text directly in every image, in real-time and in their

browser. The primary feature of Project Naptha is the text detection function. Running on an

called the “Stroke Width Transform, developed by Microsoft Research in 2008, it provides the capability of identifying regions of text in a

language-agnostic Language-agnostic programming or scripting (also called language-neutral, language-independent, or cross-language) is a software paradigm in which no particular language is promoted. In introductory instruction, the term refers to teaching princip ...

manner and detecting angled text and text in images. This is done by using the width of the lines that make up letters as a means to identify elements that could potentially be text rather than trying to spot predetermined separate features as a marker of text. In this case, the programme becomes highly

intuitive Intuition is the ability to acquire knowledge without recourse to conscious reasoning. Different fields use the word "intuition" in very different ways, including but not limited to: direct access to unconscious knowledge; unconscious cognition; ...

, similar to humans whereby we do not need to understand a language in order to recognize a written text. Project Naptha automatically applies

state-of-the-art The state of the art (sometimes cutting edge or leading edge) refers to the highest level of general development, as of a device, technique, or scientific field achieved at a particular time. However, in some contexts it can also refer to a level ...

computer vision

s on every image available when browsing the web, allowing users to highlight, copy and paste, edit and translate text which were formerly trapped within an image. A technique similar to Photoshop's "Content-Aware Fill" feature called " inpainting” is adopted. These types of algorithms are famously known as a part of

Adobe Photoshop Adobe Photoshop is a raster graphics editor developed and published by Adobe Inc. for Microsoft Windows, Windows and macOS. It was originally created in 1988 by Thomas Knoll, Thomas and John Knoll. Since then, the software has become the indu ...

’s “Content-Aware Fill” feature. It involves the using of an

that automatically fills in the space previously occupied by text with colors from the surrounding area, matching the font of the translated text in the style of the original image. This is done so by, first, detecting the text and retrieving the solid colours from the regions surrounding the text. Following, the colours will be spread around and inwards till the entire area is filled up. This technique allows user to reconstruct images as well as to edit and remove words from an image with the capturing and processing of the independent colours from regions around the edited text. In order to provide a seamless and intuitive experience for the user, the

extension Extension, extend or extended may refer to: Mathematics Logic or set theory * Axiom of extensionality * Extensible cardinal * Extension (model theory) * Extension (predicate logic), the set of tuples of values that satisfy the predicate * E ...

technique tracks cursor movements and continuously extrapolates a second ahead based on its position and velocity, predicting where highlights might be made over an image. The Project Naptha software then scans and runs a processor-intensive character recognition algorithms, processing potential text that users might want to pick out from an image, ahead of time.

Application

Project Naptha can be used on a few applications, enabling users to copy texts from any images displayed in the browser. This includes comics, photos,

screenshot screenshot (also known as screen capture or screen grab) is a digital image that shows the contents of a computer display. A screenshot is created by the operating system or software running on the device powering the display. Additionally, s ...

s, images with text overlays such as internet memes, animated

GIFS The Graphics Interchange Format (GIF; or , see pronunciation) is a bitmap image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released on 15 June 1987. It ...

, scans, diagrams with labels, and translations.

Comics

In October 2013, the first

prototype A prototype is an early sample, model, or release of a product built to test a concept or process. It is a term used in a variety of contexts, including semantics, design, electronics, and Software prototyping, software programming. A prototyp ...

for the extension for comics was released. The need for an extension for comic was due to the use of comic fonts, which are more casual and informal. Characters are often placed closely together as if they are connected and if one tries to copy and paste text from a comic, the copied text will usually appear to be jumbled up and unclear.

Photos

The

used by Project Naptha for photos is the Stroke Width Transform, which was specially designed for detecting text in natural scenes and photographs. This is because photographs are generally tougher and more technically challenging to copy texts from as compared to most regular images.

Screenshots

For Screenshots, Project Naptha transforms static screenshots into something more similar to an interactive snapshot of the computer as it was when the screen was captured. The cursor changes when hovering over different parts, and blocks of text become selectable.

Editing Text on Images

Project Naptha allows one to erase and edit texts on an image by using the translation technology. This translation technology essentially makes use of “ Inpainting”. During the changing of a text, it uses the same trick that

uses. The Translate menu includes the capability to translate in-image text to many other different languages such as English, Spanish, Russian, French, Chinese Simplified, Chinese Traditional, Japanese, or German.

Technical Limitations

There are a few technical difficulties that Project Naptha still faces despite the constant improvements made to the software. The

nature of Project Naptha's underlying Stroke Width Transform algorithm allows it detect the little squiggles as text. Despite it being a plus point since it is capable of detecting minor details, it can also end up to be seen as a bug by detecting and including too many unwanted details. When the colours of the texts and background of an image are similar, it becomes challenging for words to be detected, as words become less distinctive from the image. This creates inaccuracies in the detection and copying of texts. Due to character segmentation, handwritings are especially tough for detection. The characters in handwritings are often written too close to each other, making it difficult to segment the characters or to separate the letters apart. Hence, copying texts from these types of sources will result in high

inaccuracy Accuracy and precision are two measures of ''observational error''. ''Accuracy'' is how close a given set of measurements ( observations or readings) are to their ''true value'', while ''precision'' is how close the measurements are to each oth ...

and with

jumble Jumble is a word puzzle with a clue, a drawing illustrating the clue, and a set of words, each of which is “jumbled” by scrambling its letters. A solver reconstructs the words, and then arranges letters at marked positions in the words to sp ...

d letters. As part of an improvement feature, Project Naptha started work on it and enabling it to support rotated text. However, this function is only limited only up to about 30 degrees. Any text with rotation of more than 30 degrees may become incapable of being copied or translated. For techniques that make use of inpainting, present loopholes to it is that images may hardly be a substitute for the original and can leave marks of it being edited. It will however, look as though the words have been flawlessly removed from the image from a distance away.

Security

Security Concerns

For any other software that is used on sites, one of the greatest concerns is due to issues arising regarding the balance between user experience and

privacy Privacy (, ) is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. The domain of privacy partially overlaps with security, which can include the concepts of a ...

. It is understood that the developers of Project Naptha are doing their best in attempting to allow the processing on the client side (i.e., within the browser). However, as text selected by users for extraction from the image are being processed in the cloud. This means that in order to achieve higher

accuracy, there is still a need to rely on greater cloud processing and hence compromising on privacy. There is a

default Default may refer to: Law * Default (law), the failure to do something required by law ** Default (finance), failure to satisfy the terms of a loan obligation or failure to pay back a loan ** Default judgment, a binding judgment in favor of ei ...

setting which helps to strike a delicate balance between having all the functionality made available and respecting user privacy. By default, when users begin selecting a text, a secure HTTPS request is sent. This is only contains the URL of the specific image and nothing else – no User Tokens, no Website Information, no

Cookies A cookie is a baked or cooked snack or dessert that is typically small, flat and sweet. It usually contains flour, sugar, egg, and some type of oil, fat, or butter. It may include other ingredients such as raisins, oats, chocolate chips, nuts ...

or analytics and the requests are not logged. The server responds with a list of existing translations and OCR languages that have been done. This allows you to recognize text from an image with much more accuracy than otherwise possible. Depending on the preference of the users, this default function can be disabled by checking on the “Disable Lookup” item under the Options Menu.

Privacy

When installed, Project Naptha requires the permissions and sweeping access to user's information. This informations would be requested in the installation dialog. In order to allow for the interaction with all images, it requires the permission from the user for the software to read all images from all sites. On another hand, if the user does not want to allow access for Project Naptha to all images on all sides, they can also disable this function under the installation dialog. In this case, Project Naptha will be operating at a very low level of access, and is ideally the kind of functionality that gets built into browsers and operating systems natively. The extension is almost entirely written in client side

JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...

, allowing the extension to function without an access to a remote server. However a point to take note is that an online translation running offline is contradicting and the inadequate access to a cached OCR service running in the cloud would mean a compromise and reduction in performance and lower

accuracy. Lastly due to scalability issues, the translation feature is currently in limited rollout. The online OCR services has per-user metering, hence requires a unique identifier token. This token is completely anonymous and is not linked with any

personally identifiable information Personal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person. The abbreviation PII is widely accepted in the United States, but the phrase it abbreviates ha ...

Future Developments

Apart from the current software that allows one to manipulate texts inside the images, there is an experimental feature that plans to widen the ability of the software. Under this experimental extension, the software aims to allow users to search for texts inside images on a current page, serving as a great feature for all users. Project Naptha has also been looking at different ways to improve on its limitations. Currently, text can only be of a rotation angle of not more than 30 degrees otherwise it would be of inferior quality. Project Naptha will aim to increase the quality in its future versions by using better-trained models and algorithms. There is also a possibility of the inclusion of transcription services that will be assisted by humans. Also, the techniques of inpainting may leave marks on the original image, making it obvious that it has been edited. This technique is expected to improve as well, especially with a technique of detecting logic besides simply detecting fonts. Currently, inpainted reads fonts in this manner - ''if uppercase and super bold, then Impact font, if uppercase otherwise then XKCD font, and for everything else, Helvetica Neue''. As acknowledged by Kwok, Project Naptha still has to improve on many of its functionality. The main reason is because in terms of its various subcomponents and algorithms, Project Naptha is a few years behind the

state of the art The state of the art (sometimes cutting edge or leading edge) refers to the highest level of general development, as of a device, technique, or scientific field achieved at a particular time. However, in some contexts it can also refer to a level ...

. However, he firmly believes that over time, text recognition, translation and deletion can all be developed further and this immense potential is definitely one that will be exciting.

References

{{Reflist, colwidth=45em 2013 software Google Chrome extensions Optical character recognition