Wordfilter
   HOME

TheInfoList



OR:

A wordfilter (sometimes referred to as just "filter" or "censor") is a script typically used on
Internet forum An Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are often longer than one line of text, and are at least temporar ...
s or
chat room The term chat room, or chatroom (and sometimes group chat; abbreviated as GC), is primarily used to describe any form of synchronous conferencing, occasionally even asynchronous conferencing. The term can thus mean any technology, ranging from ...
s that automatically scans users' posts or comments as they are submitted and automatically changes or
censors Censorship is the suppression of speech, public communication, or other information. This may be done on the basis that such material is considered objectionable, harmful, sensitive, or "inconvenient". Censorship can be conducted by governments ...
particular words or phrases. The most basic wordfilters search only for specific strings of letters, and remove or overwrite them regardless of their context. More advanced wordfilters make some exceptions for context (such as filtering "butt" but not "butter"), and the most advanced wordfilters may use
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s.


Functions

Wordfilters can serve any of a number of functions.


Removal of vulgar language

A ''swear filter'', also known as a ''profanity filter'' or ''language filter'' is a
software Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work. ...
subsystem which modifies text to remove words deemed offensive by the administrator or community of an
online forum An Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are often longer than one line of text, and are at least temporar ...
. Swear filters are common in custom-programmed
chat room The term chat room, or chatroom (and sometimes group chat; abbreviated as GC), is primarily used to describe any form of synchronous conferencing, occasionally even asynchronous conferencing. The term can thus mean any technology, ranging from ...
s and
online video game An online game is a video game that is either partially or primarily played through the Internet or any other computer network available. Online games are ubiquitous on modern gaming platforms, including PCs, consoles and mobile devices, and s ...
s, primarily MMORPGs. This is not to be confused with
content filtering An Internet filter is software that restricts or controls the content an Internet user is capable to access, especially when utilized to restrict material delivered over the Internet via the Web, Email, or other means. Content-control software dete ...
, which is usually built into internet browsing programs by
third-party developer A video game developer is a broad term for a software developer specializing in video game development – the process and related disciplines of creating video games. A game developer can range from one person who undertakes all tasks to a large ...
s to filter or block specific websites or types of websites. Swear filters are usually created or implemented by the developers of the Internet service. Most commonly, wordfilters are used to censor language considered inappropriate by the operators of the forum or chat room. Expletives are typically partially replaced, completely replaced, or replaced by nonsense words. This relieves the administrators or moderators of the task of constantly patrolling the board to watch for such language. This may also help the message board avoid
content-control software An Internet filter is software that restricts or controls the content an Internet user is capable to access, especially when utilized to restrict material delivered over the Internet via the Web, Email, or other means. Content-control software dete ...
installed on users' computers or networks, since such software often blocks access to Web pages that contain vulgar language. Filtered phrases may be permanently replaced as it is saved (example:
phpBB phpBB is an Internet forum package written in the PHP scripting language. The name "phpBB" is an abbreviation of ''PHP Bulletin Board''. Available under the GNU General Public License, phpBB is free and open-source. Features of phpBB include ...
1.x), or the original phrase may be saved but displayed as the censored text. In some software users can view the text behind the wordfilter by quoting the post. Swear filters typically take advantage of string replacement functions built into the
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
used to create the program, to swap out a list of inappropriate words and phrases with a variety of alternatives. Alternatives can include: *
Grawlix Grawlix (), also known as obscenicon, is a combination of various typographical symbols or other unpronounceable characters that replaces a profanity. It is mainly used in cartoons and comics. It is used to get around language restrictions or c ...
nonsense characters, such as !@#$%^&* * Replacing a certain letter with a shift-number character or a similar looking one. * Asterisks (* or #) of either a set length, or the length of the original word being filtered. Alternatively, posters often replace certain letters with an asterisk. * Minced oaths such as "heck" or "darn", or invented words such as "flum". * Family friendly words or phrases, or euphemisms, like "LOVE" or "I LOVE YOU", or completely different words which have nothing to do with the original word. * Deletion of the post. In this case, the entire post is blocked and there is usually no way to fix it. * Nothing at all. In this case, the offending word is deleted. Some swear filters do a simple search for a string. Others have measures that ignore whitespace, and still others go as far as ignoring all non-
alphanumeric Alphanumericals or alphanumeric characters are a combination of alphabetical and numerical characters. More specifically, they are the collection of Latin letters and Arabic digits. An alphanumeric code is an identifier made of alphanumeric c ...
characters and then filtering the plain text. This means that if the word "you" was set to be filtered, "y o u" or "y.o!u" would also be filtered.


Cliché control

Clichés—particular words or phrases constantly reused in posts, also known as "memes"—often develop on forums. Some users find that these clichés add to the fun, but other users find them tedious, especially when overused. Administrators may configure the wordfilter to replace the annoying cliché with a more embarrassing phrase, or remove it altogether.


Vandalism control

Internet forums are sometimes attacked by
vandals The Vandals were a Germanic people who first inhabited what is now southern Poland. They established Vandal kingdoms on the Iberian Peninsula, Mediterranean islands, and North Africa in the fifth century. The Vandals migrated to the area betw ...
who try to fill the forum with repeated nonsense messages, or by
spammers This is a list of individuals and organizations noteworthy for engaging in bulk electronic spamming, either on their own behalf or on behalf of others. It is not a list of all spammers, only those whose actions have attracted substantial independen ...
who try to insert links to their commercial web sites. The site's wordfilter may be configured to remove the nonsense text used by the vandals, or to remove all links to particular websites from posts.


Lameness filter

Lameness filters are text-based wordfilters used by Slash-based websites (i.e.
Textboard A textboard is a simple kind of Internet forum; most textboards require neither registration nor entry of a screen name. Textboards, like imageboards, were invented in Japan, but they remain relatively unknown outside it, in contrast to imageboar ...
s and
Imageboard An imageboard is a type of Internet forum that focuses on the posting of images, often alongside text and discussion. The first imageboards were created in Japan as an extension of the textboard concept. These sites later inspired the creation of ...
s) to stop junk comments from being posted in response to stories. Some of the things they are designed to filter include: *Too many capital letters *Too much repetition * ASCII art *Comments which are too short or long *Use of HTML tags that try to break web pages *Comment titles consisting solely of "first post" *Any occurrence of a word or term deemed (by the programmers) to be offensive/vulgar


Circumventing filters

Since wordfilters are automated and look only for particular sequences of
characters Character or Characters may refer to: Arts, entertainment, and media Literature * ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk * ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
, users aware of the filters will sometimes try to circumvent them by changing their lettering just enough to avoid the filters. A user trying to avoid a vulgarity filter might replace one of the characters in the offending word into an asterisk, dash, or something similar. Some
administrators Administrator or admin may refer to: Job roles Computing and internet * Database administrator, a person who is responsible for the environmental aspects of a database * Forum administrator, one who oversees discussions on an Internet forum * ...
respond by revising the wordfilters to catch common substitutions; others may make filter evasion a punishable offense of its own. A simple example of evading a wordfilter would be entering symbols between letters or using
leet Leet (or "1337"), also known as eleet or leetspeak, is a system of modified spellings used primarily on the Internet. It often uses character replacements in ways that play on the similarity of their glyphs via reflection or other resemblance. ...
. More advanced techniques of wordfilter evasion include the use of images, using hidden tags, or
Cyrillic characters The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic languages, Slavic, Turkic languages, Turkic, Mongolic languages, ...
(i.e. a
homograph spoofing attack The internationalized domain name (IDN) homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike (i.e., they are ...
). Another method is to use a
soft hyphen In computing and typesetting, a soft hyphen (ISO 8859: 0xAD, Unicode , HTML: ­ or ­ or ­) or syllable hyphen (EBCDIC: 0xCA), abbreviated SHY, is a code point reserved in some coded character sets for the purpose of breaki ...
. A soft hyphen is only used to indicate where a word can be split when breaking text lines and is not displayed. By placing this halfway in a word, the word gets broken up and will in some cases not be recognised by the wordfilter. Some more advanced filters, such as those in the online game ''
RuneScape ''RuneScape'' is a fantasy massively multiplayer online role-playing game (MMORPG) developed and published by Jagex, released in January 2001. ''RuneScape'' was originally a browser game built with the Java programming language; it was lar ...
'', can detect bypassing. However, the downside of sensitive wordfilters is that legitimate phrases get filtered out as well.


Censorship aspects

Wordfilters are coded into the Internet forums or chat rooms, and operate only on material submitted to the forum or chat room in question. This distinguishes wordfilters from
content-control software An Internet filter is software that restricts or controls the content an Internet user is capable to access, especially when utilized to restrict material delivered over the Internet via the Web, Email, or other means. Content-control software dete ...
, which is typically installed on an end user's PC or computer network, and which can filter all Internet content sent to or from the PC or network in question. Since wordfilters alter users' words without their consent, some users still consider them to be
censorship Censorship is the suppression of speech, public communication, or other information. This may be done on the basis that such material is considered objectionable, harmful, sensitive, or "inconvenient". Censorship can be conducted by governments ...
, while others consider them an acceptable part of a forum operator's right to control the contents of the forum.


False positives

A common quirk with wordfilters, often considered either comical or aggravating by users, is that they often affect words that are not intended to be filtered. This is a typical problem when short words are filtered. For example, with the word "ass" censored, one may see, "Do you need istance for playing clical music?" Multiple words may be filtered if whitespace is ignored, resulting in "as suspected" becoming " uspected". Prohibiting a phrase such as "hard on" will result in filtering innocuous statements such as "That was a hard one!" and "Sorry I was hard on you," into "That was a e!" and "Sorry I was you." Some words that have been filtered accidentally can become replacements for profane words. One example of this is found on the
Myst ''Myst'' is a graphic adventure/puzzle video game designed by the Miller brothers, Robyn and Rand. It was developed by Cyan, Inc., published by Broderbund, and initially released for the Macintosh in 1993. In the game, the player's charact ...
forum Mystcommunity. There, the word 'manuscript' was accidentally censored for containing the word 'anus', which resulted in 'm****cript'. The word was adopted as a replacement swear and carried over when the forum moved, and many substitutes, such as " 'scripting ", are used (though mostly by the older community members). Place names may be filtered out unintentionally due to containing portions of swear words. In the early years of the internet, the British place name
Penistone Penistone ( ) is a market town and civil parish in the Metropolitan Borough of Barnsley, South Yorkshire, England, which had a population of 22,909 at the 2011 census. Historically in the West Riding of Yorkshire, it is west of Barnsley, n ...
was often filtered out from spam and swear filters.


Implementation

Many games, such as ''
World of Warcraft ''World of Warcraft'' (''WoW'') is a massively multiplayer online role-playing game (MMORPG) released in 2004 by Blizzard Entertainment. Set in the ''Warcraft'' fantasy universe, ''World of Warcraft'' takes place within the world of Azeroth ...
'', and more recently, '' Habbo Hotel'' and ''
RuneScape ''RuneScape'' is a fantasy massively multiplayer online role-playing game (MMORPG) developed and published by Jagex, released in January 2001. ''RuneScape'' was originally a browser game built with the Java programming language; it was lar ...
'' allow users to turn the filters off. Other games, especially free
Massively multiplayer online game A massively multiplayer online game (MMOG or more commonly MMO) is an online video game with a large number of players, often hundreds or thousands, on the same server. MMOs usually feature a huge, persistent open world, although there are ...
s, such as ''
Knight Online Knight Online is an MMORPG developed by . Knight Online is officially free-to-play (although to connect to the most of the servers during prime time, paid premium is required), but there are some features that must be paid for. In addition, th ...
'' do not have such an option. Other games such as ''
Medal of Honor The Medal of Honor (MOH) is the United States Armed Forces' highest military decoration and is awarded to recognize American soldiers, sailors, marines, airmen, guardians and coast guardsmen who have distinguished themselves by acts of valo ...
'' and '' Call of Duty'' (except '' Call of Duty: World at War'', '' Call of Duty: Black Ops'', '' Call of Duty: Black Ops 2'', and '' Call of Duty: Black Ops 3'') do not give users the option to turn off scripted foul language, while '' Gears of War'' does. In addition to games, profanity filters can be used to moderate user generated content in forums, blogs, social media apps, kid's websites, and product reviews. There are many profanity filter APIs lik
WebPurify
that help in replacing the swear words with other characters (i.e. "@#$!"). These profanity filters APIs work with profanity search and replace method.


See also

* *
Content-control software An Internet filter is software that restricts or controls the content an Internet user is capable to access, especially when utilized to restrict material delivered over the Internet via the Web, Email, or other means. Content-control software dete ...
*
Internet censorship Internet censorship is the legal control or suppression of what can be accessed, published, or viewed on the Internet. Censorship is most often applied to specific internet domains (such as Wikipedia.org) but exceptionally may extend to all Int ...
*
Scunthorpe problem The Scunthorpe problem is the unintentional blocking of websites, e-mails, forum posts or search results by a spam filter or search engine because their text contains a string (or substring) of letters that appear to have an obscene or otherwise ...


References


External links


Online Text Obfuscator
– replaces characters with similar Unicode chars from different character sets (e.g. Cyrillic)

– Text Tools Online:Alphabetic sort, Remove duplicates, Delete All Non Alphanumeric Characters, Only Numbers, Letters etc. replaces characters with similar Unicode chars from different character sets (e.g. Cyrillic) {{Internet censorship circumvention technologies Prudishness Internet forum terminology Content-control software Internet censorship