HOME

TheInfoList



OR:

Algorithmic bias describes systematic and repeatable
error An error (from the Latin ''error'', meaning "wandering") is an action which is inaccurate or incorrect. In some usages, an error is synonymous with a mistake. The etymology derives from the Latin term 'errare', meaning 'to stray'. In statistics ...
s in a computer system that create " unfair" outcomes, such as "privileging" one category over another in ways different from the intended function of the algorithm. Bias can emerge from many factors, including but not limited to the design of the algorithm or the unintended or unanticipated use or decisions relating to the way data is coded, collected, selected or used to train the algorithm. For example, algorithmic bias has been observed in search engine results and
social media platforms Social media are interactive media technologies that facilitate the creation and sharing of information, ideas, interests, and other forms of expression through virtual communities and networks. While challenges to the definition of ''social medi ...
. This bias can have impacts ranging from inadvertent privacy violations to reinforcing social biases of race, gender, sexuality, and ethnicity. The study of algorithmic bias is most concerned with algorithms that reflect "systematic and unfair" discrimination. This bias has only recently been addressed in legal frameworks, such as the European Union's
General Data Protection Regulation The General Data Protection Regulation (GDPR) is a European Union regulation on data protection and privacy in the EU and the European Economic Area (EEA). The GDPR is an important component of EU privacy law and of human rights law, in par ...
(2018) and the proposed Artificial Intelligence Act (2021). As algorithms expand their ability to organize society, politics, institutions, and behavior, sociologists have become concerned with the ways in which unanticipated output and manipulation of data can impact the physical world. Because algorithms are often considered to be neutral and unbiased, they can inaccurately project greater authority than human expertise (in part due to the psychological phenomenon of
automation bias Automation bias is the propensity for humans to favor suggestions from automated decision-making systems and to ignore contradictory information made without automation, even if it is correct. Automation bias stems from the social psychology lite ...
), and in some cases, reliance on algorithms can displace human responsibility for their outcomes. Bias can enter into algorithmic systems as a result of pre-existing cultural, social, or institutional expectations; because of technical limitations of their design; or by being used in unanticipated contexts or by audiences who are not considered in the software's initial design. Algorithmic bias has been cited in cases ranging from election outcomes to the spread of online hate speech. It has also arisen in criminal justice, healthcare, and hiring, compounding existing racial, socioeconomic, and gender biases. The relative inability of facial recognition technology to accurately identify darker-skinned faces has been linked to multiple wrongful arrests of black men, an issue stemming from imbalanced datasets. Problems in understanding, researching, and discovering algorithmic bias persist due to the proprietary nature of algorithms, which are typically treated as trade secrets. Even when full transparency is provided, the complexity of certain algorithms poses a barrier to understanding their functioning. Furthermore, algorithms may change, or respond to input or output in ways that cannot be anticipated or easily reproduced for analysis. In many cases, even within a single website or application, there is no single "algorithm" to examine, but a network of many interrelated programs and data inputs, even between users of the same service.


Definitions

Algorithms are difficult to define, but may be generally understood as lists of instructions that determine how programs read, collect, process, and analyze
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
to generate output. For a rigorous technical introduction, see
Algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s. Advances in computer hardware have led to an increased ability to process, store and transmit data. This has in turn boosted the design and adoption of technologies such as
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
and
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...
. By analyzing and processing data, algorithms are the backbone of search engines, social media websites, recommendation engines, online retail, online advertising, and more. Contemporary social scientists are concerned with algorithmic processes embedded into hardware and software applications because of their political and social impact, and question the underlying assumptions of an algorithm's neutrality. The term ''algorithmic bias'' describes systematic and repeatable errors that create unfair outcomes, such as privileging one arbitrary group of users over others. For example, a
credit score A credit score is a numerical expression based on a level analysis of a person's credit files, to represent the creditworthiness of an individual. A credit score is primarily based on a credit report, information typically sourced from credit b ...
algorithm may deny a loan without being unfair, if it is consistently weighing relevant financial criteria. If the algorithm recommends loans to one group of users, but denies loans to another set of nearly identical users based on unrelated criteria, and if this behavior can be repeated across multiple occurrences, an algorithm can be described as ''biased''. This bias may be intentional or unintentional (for example, it can come from biased data obtained from a worker that previously did the job the algorithm is going to do from now on).


Methods

Bias can be introduced to an algorithm in several ways. During the assemblage of a dataset, data may be collected, digitized, adapted, and entered into a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases ...
according to human-designed
cataloging In library and information science, cataloging ( US) or cataloguing ( UK) is the process of creating metadata representing information resources, such as books, sound recordings, moving images, etc. Cataloging provides information such as auth ...
criteria. Next, programmers assign priorities, or hierarchies, for how a program assesses and sorts that data. This requires human decisions about how data is categorized, and which data is included or discarded. Some algorithms collect their own data based on human-selected criteria, which can also reflect the bias of human designers. Other algorithms may reinforce stereotypes and preferences as they process and display "relevant" data for human users, for example, by selecting information based on previous choices of a similar user or group of users. Beyond assembling and processing data, bias can emerge as a result of design. For example, algorithms that determine the allocation of resources or scrutiny (such as determining school placements) may inadvertently discriminate against a category when determining risk based on similar users (as in credit scores). Meanwhile, recommendation engines that work by associating users with similar users, or that make use of inferred marketing traits, might rely on inaccurate associations that reflect broad ethnic, gender, socio-economic, or racial stereotypes. Another example comes from determining criteria for what is included and excluded from results. This criteria could present unanticipated outcomes for search results, such as with flight-recommendation software that omits flights that do not follow the sponsoring airline's flight paths. Algorithms may also display an ''uncertainty bias'', offering more confident assessments when larger
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
s are available. This can skew algorithmic processes toward results that more closely correspond with larger samples, which may disregard data from underrepresented populations.


History


Early critiques

The earliest computer programs were designed to mimic human reasoning and deductions, and were deemed to be functioning when they successfully and consistently reproduced that human logic. In his 1976 book ''
Computer Power and Human Reason ''Computer Power and Human Reason: From Judgment to Calculation'' (1976) by Joseph Weizenbaum displays the author's ambivalence towards computer technology and lays out the case that while artificial intelligence may be possible, we should never ...
'',
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...
pioneer
Joseph Weizenbaum Joseph Weizenbaum (8 January 1923 – 5 March 2008) was a German American computer scientist and a professor at MIT. The Weizenbaum Award is named after him. He is considered one of the fathers of modern artificial intelligence. Life and caree ...
suggested that bias could arise both from the data used in a program, but also from the way a program is coded. Weizenbaum wrote that
programs Program, programme, programmer, or programming may refer to: Business and management * Program management, the process of managing several related projects * Time management * Program, a part of planning Arts and entertainment Audio * Progra ...
are a sequence of rules created by humans for a computer to follow. By following those rules consistently, such programs "embody law", that is, enforce a specific way to solve problems. The rules a computer follows are based on the assumptions of a computer programmer for how these problems might be solved. That means the code could incorporate the programmer's imagination of how the world works, including their biases and expectations. While a computer program can incorporate bias in this way, Weizenbaum also noted that any data fed to a machine additionally reflects "human decisionmaking processes" as data is being selected. Finally, he noted that machines might also transfer good information with
unintended consequence In the social sciences, unintended consequences (sometimes unanticipated consequences or unforeseen consequences) are outcomes of a purposeful action that are not intended or foreseen. The term was popularised in the twentieth century by Ameri ...
s if users are unclear about how to interpret the results. Weizenbaum warned against trusting decisions made by computer programs that a user doesn't understand, comparing such faith to a tourist who can find his way to a hotel room exclusively by turning left or right on a coin toss. Crucially, the tourist has no basis of understanding how or why he arrived at his destination, and a successful arrival does not mean the process is accurate or reliable. An early example of algorithmic bias resulted in as many as 60 women and ethnic minorities denied entry to St. George's Hospital Medical School per year from 1982 to 1986, based on implementation of a new computer-guidance assessment system that denied entry to women and men with "foreign-sounding names" based on historical trends in admissions. While many schools at the time employed similar biases in their selection process, St. George was most notable for automating said bias through the use of an algorithm, thus gaining the attention of people on a much wider scale. In recent years, when more algorithms started to use machine learning methods on real world data, algorithmic bias can be found more often due to the bias existing in the data.


Contemporary critiques and responses

Though well-designed algorithms frequently determine outcomes that are equally (or more) equitable than the decisions of human beings, cases of bias still occur, and are difficult to predict and analyze. The complexity of analyzing algorithmic bias has grown alongside the complexity of programs and their design. Decisions made by one designer, or team of designers, may be obscured among the many pieces of code created for a single program; over time these decisions and their collective impact on the program's output may be forgotten. In theory, these biases may create new patterns of behavior, or "scripts", in relationship to specific technologies as the code interacts with other elements of society. Biases may also impact how society shapes itself around the
data point In statistics, a unit of observation is the unit described by the data that one analyzes. A study may treat groups as a unit of observation with a country as the unit of analysis, drawing conclusions on group characteristics from data collected at ...
s that algorithms require. For example, if data shows a high number of arrests in a particular area, an algorithm may assign more police patrols to that area, which could lead to more arrests. The decisions of algorithmic programs can be seen as more authoritative than the decisions of the human beings they are meant to assist, a process described by author
Clay Shirky Clay Shirky (born 1964) is an American writer, consultant and teacher on the social and economic effects of Internet technologies and journalism. In 2017 he was appointed Vice Provost of Educational Technologies of New York University (NYU), aft ...
as "algorithmic authority". Shirky uses the term to describe "the decision to regard as authoritative an unmanaged process of extracting value from diverse, untrustworthy sources", such as search results. This neutrality can also be misrepresented by the language used by experts and the media when results are presented to the public. For example, a list of news items selected and presented as "trending" or "popular" may be created based on significantly wider criteria than just their popularity. Because of their convenience and authority, algorithms are theorized as a means of delegating responsibility away from humans. This can have the effect of reducing alternative options, compromises, or flexibility. Sociologist
Scott Lash Scott Lash (born December 23, 1945) is a professor of sociology and cultural studies at Goldsmiths, University of London. Lash obtained a BSc in Psychology from the University of Michigan, an MA in Sociology from Northwestern University, and a PhD ...
has critiqued algorithms as a new form of "generative power", in that they are a virtual means of generating actual ends. Where previously human behavior generated data to be collected and studied, powerful algorithms increasingly could shape and define human behaviors. Concerns over the impact of algorithms on society have led to the creation of working groups in organizations such as
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
and
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
, which have co-created a working group named Fairness, Accountability, and Transparency in Machine Learning. Ideas from Google have included community groups that patrol the outcomes of algorithms and vote to control or restrict outputs they deem to have negative consequences. In recent years, the study of the Fairness, Accountability, and Transparency (FAT) of algorithms has emerged as its own interdisciplinary research area with an annual conference called FAccT. Critics have suggested that FAT initiatives cannot serve effectively as independent watchdogs when many are funded by corporations building the systems being studied.


Types


Pre-existing

Pre-existing bias in an algorithm is a consequence of underlying social and institutional
ideologies An ideology is a set of beliefs or philosophies attributed to a person or group of persons, especially those held for reasons that are not purely epistemic, in which "practical elements are as prominent as theoretical ones." Formerly applied prim ...
. Such ideas may influence or create personal biases within individual designers or programmers. Such prejudices can be explicit and conscious, or implicit and unconscious. Poorly selected input data, or simply data from a biased source, will influence the outcomes created by machines. Encoding pre-existing bias into software can preserve social and institutional bias, and, without correction, could be replicated in all future uses of that algorithm. An example of this form of bias is the British Nationality Act Program, designed to automate the evaluation of new British citizens after the 1981 British Nationality Act. The program accurately reflected the tenets of the law, which stated that "a man is the father of only his legitimate children, whereas a woman is the mother of all her children, legitimate or not." In its attempt to transfer a particular logic into an algorithmic process, the BNAP inscribed the logic of the British Nationality Act into its algorithm, which would perpetuate it even if the act was eventually repealed.


Technical

Technical bias emerges through limitations of a program, computational power, its design, or other constraint on the system. Such bias can also be a restraint of design, for example, a search engine that shows three results per screen can be understood to privilege the top three results slightly more than the next three, as in an airline price display. Another case is software that relies on
randomness In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual rand ...
for fair distributions of results. If the
random number generation Random number generation is a process by which, often by means of a random number generator (RNG), a sequence of numbers or symbols that cannot be reasonably predicted better than by random chance is generated. This means that the particular out ...
mechanism is not truly random, it can introduce bias, for example, by skewing selections toward items at the end or beginning of a list. A ''decontextualized algorithm'' uses unrelated information to sort results, for example, a flight-pricing algorithm that sorts results by alphabetical order would be biased in favor of American Airlines over United Airlines. The opposite may also apply, in which results are evaluated in contexts different from which they are collected. Data may be collected without crucial external context: for example, when facial recognition software is used by surveillance cameras, but evaluated by remote staff in another country or region, or evaluated by non-human algorithms with no awareness of what takes place beyond the camera's
field of vision The visual field is the "spatial array of visual sensations available to observation in introspectionist psychological experiments". Or simply, visual field can be defined as the entire area that can be seen when an eye is fixed straight at a point ...
. This could create an incomplete understanding of a crime scene, for example, potentially mistaking bystanders for those who commit the crime. Lastly, technical bias can be created by attempting to formalize decisions into concrete steps on the assumption that human behavior works in the same way. For example, software weighs data points to determine whether a defendant should accept a plea bargain, while ignoring the impact of emotion on a jury. Another unintended result of this form of bias was found in the plagiarism-detection software
Turnitin Turnitin (stylized as turnitin) is an Internet-based plagiarism detection service run by the American company Turnitin, LLC, a subsidiary of Advance Publications. Founded in 1998, it sells its licenses to universities and high schools who then ...
, which compares student-written texts to information found online and returns a probability score that the student's work is copied. Because the software compares long strings of text, it is more likely to identify non-native speakers of English than native speakers, as the latter group might be better able to change individual words, break up strings of plagiarized text, or obscure copied passages through synonyms. Because it is easier for native speakers to evade detection as a result of the technical constraints of the software, this creates a scenario where Turnitin identifies foreign-speakers of English for plagiarism while allowing more native-speakers to evade detection.


Emergent

Emergent bias is the result of the use and reliance on algorithms across new or unanticipated contexts. Algorithms may not have been adjusted to consider new forms of knowledge, such as new drugs or medical breakthroughs, new laws, business models, or shifting cultural norms. This may exclude groups through technology, without providing clear outlines to understand who is responsible for their exclusion. Similarly, problems may emerge when training data (the samples "fed" to a machine, by which it models certain conclusions) do not align with contexts that an algorithm encounters in the real world. In 1990, an example of emergent bias was identified in the software used to place US medical students into residencies, the National Residency Match Program (NRMP). The algorithm was designed at a time when few married couples would seek residencies together. As more women entered medical schools, more students were likely to request a residency alongside their partners. The process called for each applicant to provide a list of preferences for placement across the US, which was then sorted and assigned when a hospital and an applicant both agreed to a match. In the case of married couples where both sought residencies, the algorithm weighed the location choices of the higher-rated partner first. The result was a frequent assignment of highly preferred schools to the first partner and lower-preferred schools to the second partner, rather than sorting for compromises in placement preference. Additional emergent biases include:


Correlations

Unpredictable correlations can emerge when large data sets are compared to each other. For example, data collected about web-browsing patterns may align with signals marking sensitive data (such as race or sexual orientation). By selecting according to certain behavior or browsing patterns, the end effect would be almost identical to discrimination through the use of direct race or sexual orientation data. In other cases, the algorithm draws conclusions from correlations, without being able to understand those correlations. For example, one triage program gave lower priority to asthmatics who had pneumonia than asthmatics who did not have pneumonia. The program algorithm did this because it simply compared survival rates: asthmatics with pneumonia are at the highest risk. Historically, for this same reason, hospitals typically give such asthmatics the best and most immediate care.


Unanticipated uses

Emergent bias can occur when an algorithm is used by unanticipated audiences. For example, machines may require that users can read, write, or understand numbers, or relate to an interface using metaphors that they do not understand. These exclusions can become compounded, as biased or exclusionary technology is more deeply integrated into society. Apart from exclusion, unanticipated uses may emerge from the end user relying on the software rather than their own knowledge. In one example, an unanticipated user group led to algorithmic bias in the UK, when the British National Act Program was created as a proof-of-concept by computer scientists and immigration lawyers to evaluate suitability for British citizenship. The designers had access to legal expertise beyond the end users in immigration offices, whose understanding of both software and immigration law would likely have been unsophisticated. The agents administering the questions relied entirely on the software, which excluded alternative pathways to citizenship, and used the software even after new case laws and legal interpretations led the algorithm to become outdated. As a result of designing an algorithm for users assumed to be legally savvy on immigration law, the software's algorithm indirectly led to bias in favor of applicants who fit a very narrow set of legal criteria set by the algorithm, rather than by the more broader criteria of British immigration law.


Feedback loops

Emergent bias may also create a
feedback loop Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can then be said to ''feed back'' into itself. The notion of cause-and-effect has to be handled c ...
, or recursion, if data collected for an algorithm results in real-world responses which are fed back into the algorithm. For example, simulations of the
predictive policing Predictive policing is the usage of mathematics, predictive analytics, and other analytical techniques in law enforcement to identify potential criminal activity. A report published by the RAND Corporation identified four general categories pred ...
software (PredPol), deployed in Oakland, California, suggested an increased police presence in black neighborhoods based on crime data reported by the public. The simulation showed that the public reported crime based on the sight of police cars, regardless of what police were doing. The simulation interpreted police car sightings in modeling its predictions of crime, and would in turn assign an even larger increase of police presence within those neighborhoods. The Human Rights Data Analysis Group, which conducted the simulation, warned that in places where racial discrimination is a factor in arrests, such feedback loops could reinforce and perpetuate racial discrimination in policing. Another well known example of such an algorithm exhibiting such behavior is
COMPAS Compas, also known as compas direct or compas direk (; Haitian Creole: ''konpa'', ''kompa'' or ''kompa dirèk''), is a modern méringue dance music genre of Haiti. The genre was popularized following the creation of Ensemble Aux Callebasses in ...
, a software that determines an individual's likelihood of becoming a criminal offender. The software is often criticized for labeling Black individuals as criminals much more likely than others, and then feeds the data back into itself in the event individuals become registered criminals, further enforcing the bias created by the dataset the algorithm is acting on. Recommender systems such as those used to recommend online videos or news articles can create feedback loops. When users click on content that is suggested by algorithms, it influences the next set of suggestions. Over time this may lead to users entering a
filter bubble A filter bubble or ideological frame is a state of intellectual isolationTechnopediaDefinition – What does Filter Bubble mean?, Retrieved October 10, 2017, "....A filter bubble is the intellectual isolation, that can occur when websites make us ...
and being unaware of important or useful content.


Impact


Commercial influences

Corporate algorithms could be skewed to invisibly favor financial arrangements or agreements between companies, without the knowledge of a user who may mistake the algorithm as being impartial. For example,
American Airlines American Airlines is a major US-based airline headquartered in Fort Worth, Texas, within the Dallas–Fort Worth metroplex. It is the largest airline in the world when measured by fleet size, scheduled passengers carried, and revenue passeng ...
created a flight-finding algorithm in the 1980s. The software presented a range of flights from various airlines to customers, but weighed factors that boosted its own flights, regardless of price or convenience. In testimony to the
United States Congress The United States Congress is the legislature of the federal government of the United States. It is Bicameralism, bicameral, composed of a lower body, the United States House of Representatives, House of Representatives, and an upper body, ...
, the president of the airline stated outright that the system was created with the intention of gaining competitive advantage through preferential treatment. In a 1998 paper describing
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
, the founders of the company had adopted a policy of transparency in search results regarding paid placement, arguing that "advertising-funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers." This bias would be an "invisible" manipulation of the user.


Voting behavior

A series of studies about undecided voters in the US and in India found that search engine results were able to shift voting outcomes by about 20%. The researchers concluded that candidates have "no means of competing" if an algorithm, with or without intent, boosted page listings for a rival candidate. Facebook users who saw messages related to voting were more likely to vote. A 2010 randomized trial of Facebook users showed a 20% increase (340,000 votes) among users who saw messages encouraging voting, as well as images of their friends who had voted. Legal scholar Jonathan Zittrain has warned that this could create a "digital gerrymandering" effect in elections, "the selective presentation of information by an intermediary to meet its agenda, rather than to serve its users", if intentionally manipulated.


Gender discrimination

In 2016, the professional networking site
LinkedIn LinkedIn () is an American business and employment-oriented online service that operates via websites and mobile apps. Launched on May 5, 2003, the platform is primarily used for professional networking and career development, and allows job se ...
was discovered to recommend male variations of women's names in response to search queries. The site did not make similar recommendations in searches for male names. For example, "Andrea" would bring up a prompt asking if users meant "Andrew", but queries for "Andrew" did not ask if users meant to find "Andrea". The company said this was the result of an analysis of users' interactions with the site. In 2012, the department store franchise
Target Target may refer to: Physical items * Shooting target, used in marksmanship training and various shooting sports ** Bullseye (target), the goal one for which one aims in many of these sports ** Aiming point, in field artillery, fi ...
was cited for gathering data points to infer when women customers were pregnant, even if they had not announced it, and then sharing that information with marketing partners. Because the data had been predicted, rather than directly observed or reported, the company had no legal obligation to protect the privacy of those customers. Web search algorithms have also been accused of bias. Google's results may prioritize pornographic content in search terms related to sexuality, for example, "lesbian". This bias extends to the search engine showing popular but sexualized content in neutral searches. For example, "Top 25 Sexiest Women Athletes" articles displayed as first-page results in searches for "women athletes". In 2017, Google adjusted these results along with others that surfaced hate groups, racist views, child abuse and pornography, and other upsetting and offensive content. Other examples include the display of higher-paying jobs to male applicants on job search websites. Researchers have also identified that machine translation exhibits a strong tendency towards male defaults. In particular, this is observed in fields linked to unbalanced gender distribution, including
STEM Stem or STEM may refer to: Plant structures * Plant stem, a plant's aboveground axis, made of vascular tissue, off which leaves and flowers hang * Stipe (botany), a stalk to support some other structure * Stipe (mycology), the stem of a mushro ...
occupations. In fact, current machine translation systems fail to reproduce the real world distribution of female workers. In 2015, Amazon.com turned off an AI system it developed to screen job applications when they realized it was biased against women. The recruitment tool excluded applicants who attended all-women's colleges and resumes that included the word "women's". A similar problem emerged with music streaming services—In 2019, it was discovered that the recommender system algorithm used by Spotify was biased against women artists. Spotify's song recommendations suggested more male artists over women artists.


Racial and ethnic discrimination

Algorithms have been criticized as a method for obscuring racial prejudices in decision-making. Because of how certain races and ethnic groups were treated in the past, data can often contain hidden biases. For example, black people are likely to receive longer sentences than white people who committed the same crime. This could potentially mean that a system amplifies the original biases in the data. In 2015, Google apologized when black users complained that an image-identification algorithm in its Photos application identified them as
gorillas Gorillas are herbivorous, predominantly ground-dwelling great apes that inhabit the tropical forests of equatorial Africa. The genus ''Gorilla'' is divided into two species: the eastern gorilla and the western gorilla, and either four or fi ...
. In 2010,
Nikon (, ; ), also known just as Nikon, is a Japanese multinational corporation headquartered in Tokyo, Japan, specializing in optics and imaging products. The companies held by Nikon form the Nikon Group. Nikon's products include cameras, camera ...
cameras were criticized when image-recognition algorithms consistently asked Asian users if they were blinking. Such examples are the product of bias in
biometric data Biometrics are body measurements and calculations related to human characteristics. Biometric authentication (or realistic authentication) is used in computer science as a form of identification and access control. It is also used to identify in ...
sets. Biometric data is drawn from aspects of the body, including racial features either observed or inferred, which can then be transferred into data points. Speech recognition technology can have different accuracies depending on the user's accent. This may be caused by the a lack of training data for speakers of that accent. Biometric data about race may also be inferred, rather than observed. For example, a 2012 study showed that names commonly associated with blacks were more likely to yield search results implying arrest records, regardless of whether there is any police record of that individual's name. A 2015 study also found that Black and Asian people are assumed to have lesser functioning lungs due to racial and occupational exposure data not being incorporated into the prediction algorithm's model of lung function. In 2019, a research study revealed that a healthcare algorithm sold by
Optum Optum, Inc. is an American pharmacy benefit manager and health care provider. It has been a subsidiary of UnitedHealth Group since 2011. UHG formed Optum by merging its existing pharmacy and care delivery services into the single Optum br ...
favored white patients over sicker black patients. The algorithm predicts how much patients would cost the health-care system in the future. However, cost is not race-neutral, as black patients incurred about $1,800 less in medical costs per year than white patients with the same number of chronic conditions, which led to the algorithm scoring white patients as equally at risk of future health problems as black patients who suffered from significantly more diseases. A study conducted by researchers at UC Berkeley in November 2019 revealed that mortgage algorithms have been discriminatory towards Latino and African Americans which discriminated against minorities based on "creditworthiness" which is rooted in the U.S. fair-lending law which allows lenders to use measures of identification to determine if an individual is worthy of receiving loans. These particular algorithms were present in FinTech companies and were shown to discriminate against minorities.


Law enforcement and legal proceedings

Algorithms already have numerous applications in legal systems. An example of this is
COMPAS Compas, also known as compas direct or compas direk (; Haitian Creole: ''konpa'', ''kompa'' or ''kompa dirèk''), is a modern méringue dance music genre of Haiti. The genre was popularized following the creation of Ensemble Aux Callebasses in ...
, a commercial program widely used by
U.S. court The courts of the United States are closely linked hierarchical systems of courts at the federal and state levels. The federal courts form the judicial branch of the US government and operate under the authority of the United States Constitution a ...
s to assess the likelihood of a
defendant In court proceedings, a defendant is a person or object who is the party either accused of committing a crime in criminal prosecution or against whom some type of civil relief is being sought in a civil case. Terminology varies from one jurisd ...
becoming a recidivist.
ProPublica ProPublica (), legally Pro Publica, Inc., is a nonprofit organization based in New York City. In 2010, it became the first online news source to win a Pulitzer Prize, for a piece written by one of its journalists''The Guardian'', April 13, 2010P ...
claims that the average COMPAS-assigned recidivism risk level of black defendants is significantly higher than the average COMPAS-assigned risk level of white defendants, and that black defendants are twice as likely to be erroneously assigned the label "high-risk" as white defendants. One example is the use of
risk assessment Broadly speaking, a risk assessment is the combined effort of: # identifying and analyzing potential (future) events that may negatively impact individuals, assets, and/or the environment (i.e. hazard analysis); and # making judgments "on the ...
s in
criminal sentencing in the United States In ordinary language, a crime is an unlawful act punishable by a state or other authority. The term ''crime'' does not, in modern criminal law, have any simple and universally accepted definition,Farmer, Lindsay: "Crime, definitions of", in Can ...
and parole hearings, judges were presented with an algorithmically generated score intended to reflect the risk that a prisoner will repeat a crime. For the time period starting in 1920 and ending in 1970, the nationality of a criminal's father was a consideration in those risk assessment scores. Today, these scores are shared with judges in Arizona, Colorado, Delaware, Kentucky, Louisiana, Oklahoma, Virginia, Washington, and Wisconsin. An independent investigation by
ProPublica ProPublica (), legally Pro Publica, Inc., is a nonprofit organization based in New York City. In 2010, it became the first online news source to win a Pulitzer Prize, for a piece written by one of its journalists''The Guardian'', April 13, 2010P ...
found that the scores were inaccurate 80% of the time, and disproportionately skewed to suggest blacks to be at risk of relapse, 77% more often than whites. One study that set out to examine "Risk, Race, & Recidivism: Predictive Bias and Disparate Impact" alleges a two-fold (45 percent vs. 23 percent) adverse likelihood for black vs. Caucasian defendants to be misclassified as imposing a higher risk despite having objectively remained without any documented recidivism over a two-year period of observation. In the pretrial detention context, a law review article argues that algorithmic risk assessments violate 14th Amendment
Equal Protection The Equal Protection Clause is part of the first section of the Fourteenth Amendment to the United States Constitution. The clause, which took effect in 1868, provides "''nor shall any State ... deny to any person within its jurisdiction the equal ...
rights on the basis of race, since the algorithms are argued to be facially discriminatory, to result in disparate treatment, and to not be narrowly tailored.


Online hate speech

In 2017 a
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dust ...
algorithm designed to remove online hate speech was found to advantage white men over black children when assessing objectionable content, according to internal Facebook documents. The algorithm, which is a combination of computer programs and human content reviewers, was created to protect broad categories rather than specific subsets of categories. For example, posts denouncing "Muslims" would be blocked, while posts denouncing "Radical Muslims" would be allowed. An unanticipated outcome of the algorithm is to allow hate speech against black children, because they denounce the "children" subset of blacks, rather than "all blacks", whereas "all white men" would trigger a block, because whites and males are not considered subsets. Facebook was also found to allow ad purchasers to target "Jew haters" as a category of users, which the company said was an inadvertent outcome of algorithms used in assessing and categorizing data. The company's design also allowed ad buyers to block African-Americans from seeing housing ads. While algorithms are used to track and block hate speech, some were found to be 1.5 times more likely to flag information posted by Black users and 2.2 times likely to flag information as hate speech if written in
African American English African-American English (or AAE; also known as Black American English, or Black English in American linguistics) is the set of English sociolects spoken by most Black people in the United States and many in Canada; most commonly, it refers t ...
. Without context for slurs and epithets, even when used by communities which have re-appropriated them, were flagged.


Surveillance

Surveillance camera software may be considered inherently political because it requires algorithms to distinguish normal from abnormal behaviors, and to determine who belongs in certain locations at certain times. The ability of such algorithms to recognize faces across a racial spectrum has been shown to be limited by the racial diversity of images in its training database; if the majority of photos belong to one race or gender, the software is better at recognizing other members of that race or gender. However, even audits of these image-recognition systems are ethically fraught, and some scholars have suggested the technology's context will always have a disproportionate impact on communities whose actions are over-surveilled. For example, a 2002 analysis of software used to identify individuals in
CCTV Closed-circuit television (CCTV), also known as video surveillance, is the use of video cameras to transmit a signal to a specific place, on a limited set of monitors. It differs from broadcast television in that the signal is not openly ...
images found several examples of bias when run against criminal databases. The software was assessed as identifying men more frequently than women, older people more frequently than the young, and identified Asians, African-Americans and other races more often than whites. Additional studies of facial recognition software have found the opposite to be true when trained on non-criminal databases, with the software being the least accurate in identifying darker-skinned females.


Sexual discrimination

In 2011, users of the gay hookup application Grindr reported that the Android store's recommendation algorithm was linking Grindr to applications designed to find sex offenders, which critics said inaccurately related homosexuality with pedophilia. Writer Mike Ananny criticized this association in ''
The Atlantic ''The Atlantic'' is an American magazine and multi-platform publisher. It features articles in the fields of politics, foreign affairs, business and the economy, culture and the arts, technology, and science. It was founded in 1857 in Boston, ...
'', arguing that such associations further stigmatized
gay men Gay men are male homosexuals. Some bisexual and homoromantic men may also dually identify as gay, and a number of young gay men also identify as queer. Historically, gay men have been referred to by a number of different terms, includin ...
. In 2009, online retailer
Amazon Amazon most often refers to: * Amazons, a tribe of female warriors in Greek mythology * Amazon rainforest, a rainforest covering most of the Amazon basin * Amazon River, in South America * Amazon (company), an American multinational technolog ...
de-listed 57,000 books after an algorithmic change expanded its "adult content" blacklist to include any book addressing sexuality or gay themes, such as the critically acclaimed novel ''
Brokeback Mountain ''Brokeback Mountain'' is a 2005 American neo-Western romantic drama film directed by Ang Lee and produced by Diana Ossana and James Schamus. Adapted from the 1997 short story of the same name by Annie Proulx, the screenplay was written ...
''. In 2019, it was found that on Facebook, searches for "photos of my female friends" yielded suggestions such as "in bikinis" or "at the beach". In contrast, searches for "photos of my male friends" yielded no results. Facial recognition technology has been seen to cause problems for transgender individuals. In 2018, there were reports of uber drivers who were transgender or transitioning experiencing difficulty with the facial recognition software that Uber implements as a built-in security measure. As a result of this, some of the accounts of trans uber drivers were suspended which cost them fares and potentially cost them a job, all due to the facial recognition software experiencing difficulties with recognizing the face of a trans driver who was transitioning. Although the solution to this issue would appear to be including trans individuals in training sets for machine learning models, an instance of trans YouTube videos that were collected to be used in training data did not receive consent from the trans individuals that were included in the videos, which created an issue of violation of privacy. There has also been a study that was conducted at Stanford University in 2017 that tested algorithms in a machine learning system that was said to be able to detect an individuals sexual orientation based on their facial images. The model in the study predicted a correct distinction between gay and straight men 81% of the time, and a correct distinction between gay and straight women 74% of the time. This study resulted in a backlash from the LGBTQIA community, who were fearful of the possible negative repercussions that this AI system could have on individuals of the LGBTQIA community by putting individuals at risk of being "outed" against their will.


Disability Discrimination

While the modalities of algorithmic fairness have been judged on the basis of different aspects of bias – like gender, race and socioeconomic status, disability often is left out of the list. The marginalization people with disabilities currently face in society is being translated into AI systems and algorithms, creating even more exclusion The shifting nature of disabilities and its subjective characterization, makes it more difficult to computationally address. The lack of historical depth in defining disabilities, collecting its incidence and prevalence in questionnaires, and establishing recognition add to the controversy and ambiguity in its quantification and calculations.  The definition of disability has been long debated shifting from a medical model to a social model of disability most recently, which establishes that disability is a result of the mismatch between people's interactions and barriers in their environment, rather than impairments and health conditions. Disabilities can also be situational or temporary, considered in a constant state of flux. Disabilities are incredibly diverse, fall within a large spectrum, and can be unique to each individual. People’s identity can vary based on the specific types of disability they experience, how they use assistive technologies, and who they support.  The high level of variability across people’s experiences greatly personalizes how a disability can manifest. Overlapping identities and intersectional experiences are excluded from statistics and datasets, hence underrepresented and nonexistent in training data. Therefore, machine learning models are trained inequitably and artificial intelligent systems perpetuate more algorithmic bias. For example, if people with speech impairments aren’t included in training voice control features and smart AI assistants –they are unable to use the feature or the responses received from a Google Home or Alexa are extremely poor. Given the stereotypes and stigmas that still exist surrounding disabilities, the sensitive nature of revealing these identifying characteristics also carries vast privacy challenges,  As disclosing disability information can be taboo and drive further discrimination against this population, there is a lack of explicit disability data available for algorithmic systems to interact with. People with disabilities face additional harms and risks with respect to their social support, cost of health insurance, workplace discrimination and other basic necessities upon disclosing their disability status. Algorithms are further exacerbating this gap by recreating the biases that already exist in societal systems and structures.


Google Search

While users generate results that are "completed" automatically, Google has failed to remove sexist and racist autocompletion text. For example, '' Algorithms of Oppression: How Search Engines Reinforce Racism'' Safiya Noble notes an example of the search for "black girls", which was reported to result in pornographic images. Google claimed it was unable to erase those pages unless they were considered unlawful.


Obstacles to research

Several problems impede the study of large-scale algorithmic bias, hindering the application of academically rigorous studies and public understanding.


Defining fairness

Literature on algorithmic bias has focused on the remedy of fairness, but definitions of fairness are often incompatible with each other and the realities of machine learning optimization. For example, defining fairness as an "equality of outcomes" may simply refer to a system producing the same result for all people, while fairness defined as "equality of treatment" might explicitly consider differences between individuals. As a result, fairness is sometimes described as being in conflict with the accuracy of a model, suggesting innate tensions between the priorities of social welfare and the priorities of the vendors designing these systems. In response to this tension, researchers have suggested more care to the design and use of systems that draw on potentially biased algorithms, with "fairness" defined for specific applications and contexts.


Complexity

Algorithmic processes are complex, often exceeding the understanding of the people who use them. Large-scale operations may not be understood even by those involved in creating them. The methods and processes of contemporary programs are often obscured by the inability to know every permutation of a code's input or output. Social scientist
Bruno Latour Bruno Latour (; 22 June 1947 – 9 October 2022) was a French philosopher, anthropologist and sociologist.Wheeler, Will. ''Bruno Latour: Documenting Human and Nonhuman Associations'' Critical Theory for Library and Information Science. Libraries ...
has identified this process as
blackboxing In science studies, the social process of blackboxing is based on the abstract notion of a black box. To cite Bruno Latour, blackboxing is "the way scientific and technical work is made invisible by its own success. When a machine runs efficient ...
, a process in which "scientific and technical work is made invisible by its own success. When a machine runs efficiently, when a matter of fact is settled, one need focus only on its inputs and outputs and not on its internal complexity. Thus, paradoxically, the more science and technology succeed, the more opaque and obscure they become." Others have critiqued the black box metaphor, suggesting that current algorithms are not one black box, but a network of interconnected ones. An example of this complexity can be found in the range of inputs into customizing feedback. The social media site Facebook factored in at least 100,000 data points to determine the layout of a user's social media feed in 2013. Furthermore, large teams of programmers may operate in relative isolation from one another, and be unaware of the cumulative effects of small decisions within connected, elaborate algorithms. Not all code is original, and may be borrowed from other libraries, creating a complicated set of relationships between data processing and data input systems. Additional complexity occurs through
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
and the personalization of algorithms based on user interactions such as clicks, time spent on site, and other metrics. These personal adjustments can confuse general attempts to understand algorithms. One unidentified streaming radio service reported that it used five unique music-selection algorithms it selected for its users, based on their behavior. This creates different experiences of the same streaming services between different users, making it harder to understand what these algorithms do. Companies also run frequent A/B tests to fine-tune algorithms based on user response. For example, the search engine
Bing Bing most often refers to: * Bing Crosby (1903–1977), American singer * Microsoft Bing, a web search engine Bing may also refer to: Food and drink * Bing (bread), a Chinese flatbread * Bing (soft drink), a UK brand * Bing cherry, a varie ...
can run up to ten million subtle variations of its service per day, creating different experiences of the service between each use and/or user.


Lack of transparency

Commercial algorithms are proprietary, and may be treated as
trade secrets Trade secrets are a type of intellectual property that includes formulas, practices, processes, designs, instruments, patterns, or compilations of information that have inherent economic value because they are not generally known or readily a ...
. Treating algorithms as trade secrets protects companies, such as
search engines A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in ...
, where a transparent algorithm might reveal tactics to manipulate search rankings. This makes it difficult for researchers to conduct interviews or analysis to discover how algorithms function. Critics suggest that such secrecy can also obscure possible unethical methods used in producing or processing algorithmic output. Other critics, such as lawyer and activist Katarzyna Szymielewicz, have suggested that the lack of transparency is often disguised as a result of algorithmic complexity, shielding companies from disclosing or investigating its own algorithmic processes.


Lack of data about sensitive categories

A significant barrier to understanding the tackling of bias in practice is that categories, such as demographics of individuals protected by
anti-discrimination law Anti-discrimination law or non-discrimination law refers to legislation designed to prevent discrimination against particular groups of people; these groups are often referred to as protected groups or protected classes. Anti-discrimination laws ...
, are often not explicitly considered when collecting and processing data. In some cases, there is little opportunity to collect this data explicitly, such as in
device fingerprint A device fingerprint or machine fingerprint is information collected about the software and hardware of a remote computing device for the purpose of identification. The information is usually assimilated into a brief identifier using a fingerprinti ...
ing,
ubiquitous computing Ubiquitous computing (or "ubicomp") is a concept in software engineering, hardware engineering and computer science where computing is made to appear anytime and everywhere. In contrast to desktop computing, ubiquitous computing can occur using ...
and the
Internet of Things The Internet of things (IoT) describes physical objects (or groups of such objects) with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other com ...
. In other cases, the data controller may not wish to collect such data for reputational reasons, or because it represents a heightened liability and security risk. It may also be the case that, at least in relation to the European Union's
General Data Protection Regulation The General Data Protection Regulation (GDPR) is a European Union regulation on data protection and privacy in the EU and the European Economic Area (EEA). The GDPR is an important component of EU privacy law and of human rights law, in par ...
, such data falls under the 'special category' provisions (Article 9), and therefore comes with more restrictions on potential collection and processing. Some practitioners have tried to estimate and impute these missing sensitive categorisations in order to allow bias mitigation, for example building systems to infer ethnicity from names, however this can introduce other forms of bias if not undertaken with care. Machine learning researchers have drawn upon cryptographic
privacy-enhancing technologies Privacy-enhancing technologies (PET) are technologies that embody fundamental data protection principles by minimizing personal data use, maximizing data security, and empowering individuals. PETs allow online users to protect the privacy of their ...
such as secure multi-party computation to propose methods whereby algorithmic bias can be assessed or mitigated without these data ever being available to modellers in
cleartext In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. Overview With the advent of co ...
. Algorithmic bias does not only include protected categories, but can also concerns characteristics less easily observable or codifiable, such as political viewpoints. In these cases, there is rarely an easily accessible or non-controversial ground truth, and removing the bias from such a system is more difficult. Furthermore, false and accidental correlations can emerge from a lack of understanding of protected categories, for example, insurance rates based on historical data of car accidents which may overlap, strictly by coincidence, with residential clusters of ethnic minorities.


Solutions

A study of 84 policy guidelines on ethical AI found that fairness and "mitigation of unwanted bias" was a common point of concern, and were addressed through a blend of technical solutions, transparency and monitoring, right to remedy and increased oversight, and diversity and inclusion efforts.


Technical

There have been several attempts to create methods and tools that can detect and observe biases within an algorithm. These emergent fields focus on tools which are typically applied to the (training) data used by the program rather than the algorithm's internal processes. These methods may also analyze a program's output and its usefulness and therefore may involve the analysis of its confusion matrix (or table of confusion). Explainable AI to detect algorithm Bias is a suggested way to detect the existence of bias in an algorithm or learning model. Using machine learning to detect bias is called, "conducting an AI audit", where the "auditor" is an algorithm that goes through the AI model and the training data to identify biases. Ensuring that an AI tool such as a classifier is free from bias is more difficult than just removing the sensitive information from its input signals, because this is typically implicit in other signals. For example, the hobbies, sports and schools attended by a job candidate might reveal their gender to the software, even when this is removed from the analysis. Solutions to this problem involve ensuring that the intelligent agent does not have any information that could be used to reconstruct the protected and sensitive information about the subject, as first demonstrated in where a deep learning network was simultaneously trained to learn a task while at the same time being completely agnostic about the protected feature. A simpler method was proposed in the context of word embeddings, and involves removing information that is correlated with the protected characteristic. Currently, a new IEEE standard is being drafted that aims to specify methodologies which help creators of algorithms eliminate issues of bias and articulate transparency (i.e. to authorities or
end user In product development, an end user (sometimes end-user) is a person who ultimately uses or is intended to ultimately use a product. The end user stands in contrast to users who support or maintain the product, such as sysops, system administrato ...
s) about the function and possible effects of their algorithms. The project was approved February 2017 and is sponsored by th
Software & Systems Engineering Standards Committee
a committee chartered by the
IEEE Computer Society The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
. A draft of the standard is expected to be submitted for balloting in June 2019.


Transparency and monitoring

Ethics guidelines on AI point to the need for accountability, recommending that steps be taken to improve the interpretability of results. Such solutions include the consideration of the "right to understanding" in machine learning algorithms, and to resist deployment of machine learning in situations where the decisions could not be explained or reviewed. Toward this end, a movement for " Explainable AI" is already underway within organizations such as
DARPA The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the A ...
, for reasons that go beyond the remedy of bias.
Price Waterhouse Coopers PricewaterhouseCoopers is an international professional services brand of firms, operating as partnerships under the PwC brand. It is the second-largest professional services network in the world and is considered one of the Big Four accounti ...
, for example, also suggests that monitoring output means designing systems in such a way as to ensure that solitary components of the system can be isolated and shut down if they skew results. An initial approach towards transparency included the open-sourcing of algorithms. Software code can be looked into and improvements can be proposed through source-code-hosting facilities. However, this approach doesn't necessarily produce the intended effects. Companies and organizations can share all possible documentation and code, but this does not establish transparency if the audience doesn't understand the information given. Therefore, the role of an interested critical audience is worth exploring in relation to transparency. Algorithms cannot be held accountable without a critical audience.


Right to remedy

From a regulatory perspective, the
Toronto Declaration The Toronto Declaration: Protecting the Rights to Equality and Non-Discrimination in Machine Learning Systems is a declaration that advocates responsible practices for machine learning practitioners and governing bodies. It is a joint statement issu ...
calls for applying a human rights framework to harms caused by algorithmic bias. This includes legislating expectations of due diligence on behalf of designers of these algorithms, and creating accountability when private actors fail to protect the public interest, noting that such rights may be obscured by the complexity of determining responsibility within a web of complex, intertwining processes. Others propose the need for clear liability insurance mechanisms.


Diversity and inclusion

Amid concerns that the design of AI systems is primarily the domain of white, male engineers, a number of scholars have suggested that algorithmic bias may be minimized by expanding inclusion in the ranks of those designing AI systems. For example, just 12% of machine learning engineers are women, with black AI leaders pointing to a "diversity crisis" in the field. Groups like Black in AI and
Queer in AI ''Queer'' is an umbrella term for people who are not heterosexual or cisgender. Originally meaning or , ''queer'' came to be used pejoratively against those with same-sex desires or relationships in the late 19th century. Beginning in the lat ...
are attempting to create more inclusive spaces in the AI community and work against the often harmful desires of corporations that control the trajectory of AI research. Critiques of simple inclusivity efforts suggest that diversity programs can not address overlapping forms of inequality, and have called for applying a more deliberate lens of
intersectionality Intersectionality is an analytical framework for understanding how aspects of a person's social and political identities combine to create different modes of discrimination and privilege. Intersectionality identifies multiple factors of adva ...
to the design of algorithms. Researchers at the University of Cambridge have argued that addressing racial diversity is hampered by the "whiteness" of the culture of AI.


Regulation


Europe

The
General Data Protection Regulation The General Data Protection Regulation (GDPR) is a European Union regulation on data protection and privacy in the EU and the European Economic Area (EEA). The GDPR is an important component of EU privacy law and of human rights law, in par ...
(GDPR), the
European Union The European Union (EU) is a supranational union, supranational political union, political and economic union of Member state of the European Union, member states that are located primarily in Europe, Europe. The union has a total area of ...
's revised data protection regime that was implemented in 2018, addresses "Automated individual decision-making, including profiling" in Article 22. These rules prohibit "solely" automated decisions which have a "significant" or "legal" effect on an individual, unless they are explicitly authorised by consent, contract, or
member state A member state is a state that is a member of an international organization or of a federation or confederation. Since the World Trade Organization (WTO) and the International Monetary Fund (IMF) include some members that are not sovereign state ...
law. Where they are permitted, there must be safeguards in place, such as a right to a human-in-the-loop, and a non-binding right to an explanation of decisions reached. While these regulations are commonly considered to be new, nearly identical provisions have existed across Europe since 1995, in Article 15 of the
Data Protection Directive The Data Protection Directive, officially Directive 95/46/EC, enacted in October 1995, is a European Union directive which regulates the processing of personal data within the European Union (EU) and the free movement of such data. The Data Pro ...
. The original automated decision rules and safeguards found in French law since the late 1970s. The GDPR addresses algorithmic bias in profiling systems, as well as the statistical approaches possible to clean it, directly in
recital A concert is a live music performance in front of an audience. The performance may be by a single musician, sometimes then called a recital, or by a musical ensemble, such as an orchestra, choir, or band. Concerts are held in a wide var ...
71, noting that
the controller should use appropriate mathematical or statistical procedures for the profiling, implement technical and organisational measures appropriate ... that prevents, inter alia, discriminatory effects on natural persons on the basis of racial or ethnic origin, political opinion, religion or beliefs, trade union membership, genetic or health status or sexual orientation, or that result in measures having such an effect.
Like the non-binding right to an explanation in recital 71, the problem is the non-binding nature of
recitals A concert is a live music performance in front of an audience. The performance may be by a single musician, sometimes then called a recital, or by a musical ensemble, such as an orchestra, choir, or band. Concerts are held in a wide variety ...
. While it has been treated as a requirement by the Article 29 Working Party that advised on the implementation of data protection law, its practical dimensions are unclear. It has been argued that the Data Protection Impact Assessments for high risk data profiling (alongside other pre-emptive measures within data protection) may be a better way to tackle issues of algorithmic discrimination, as it restricts the actions of those deploying algorithms, rather than requiring consumers to file complaints or request changes.


United States

The United States has no general legislation controlling algorithmic bias, approaching the problem through various state and federal laws that might vary by industry, sector, and by how an algorithm is used. Many policies are self-enforced or controlled by the
Federal Trade Commission The Federal Trade Commission (FTC) is an independent agency of the United States government whose principal mission is the enforcement of civil (non-criminal) antitrust law and the promotion of consumer protection. The FTC shares jurisdiction o ...
. In 2016, the Obama administration released the National Artificial Intelligence Research and Development Strategic Plan, which was intended to guide policymakers toward a critical assessment of algorithms. It recommended researchers to "design these systems so that their actions and decision-making are transparent and easily interpretable by humans, and thus can be examined for any bias they may contain, rather than just learning and repeating these biases". Intended only as guidance, the report did not create any legal precedent. In 2017,
New York City New York, often called New York City or NYC, is the List of United States cities by population, most populous city in the United States. With a 2020 population of 8,804,190 distributed over , New York City is also the L ...
passed the first algorithmic accountability bill in the United States. The bill, which went into effect on January 1, 2018, required "the creation of a task force that provides recommendations on how information on agency automated decision systems may be shared with the public, and how agencies may address instances where people are harmed by agency automated decision systems." The task force is required to present findings and recommendations for further regulatory action in 2019.


India

On July 31, 2018, a draft of the Personal Data Bill was presented. The draft proposes standards for the storage, processing and transmission of data. While it does not use the term algorithm, it makes for provisions for "harm resulting from any processing or any kind of processing undertaken by the fiduciary". It defines "any denial or withdrawal of a service, benefit or good resulting from an evaluative decision about the data principal" or "any discriminatory treatment" as a source of harm that could arise from improper use of data. It also makes special provisions for people of "Intersex status".


See also

*
Ethics of artificial intelligence The ethics of artificial intelligence is the branch of the ethics of technology specific to artificially intelligent systems. It is sometimes divided into a concern with the moral behavior of ''humans'' as they design, make, use and treat artific ...
*
Fairness (machine learning) Fairness in machine learning refers to the various attempts at correcting algorithmic bias in automated decision processes based on machine learning models. Decisions made by computers after a machine-learning process may be considered unfair if ...
*
Predictive policing Predictive policing is the usage of mathematics, predictive analytics, and other analytical techniques in law enforcement to identify potential criminal activity. A report published by the RAND Corporation identified four general categories pred ...
*
SenseTime SenseTime () is a Hong Kong-headquartered artificial intelligence company with offices in China, Indonesia, Japan, South Korea, Macau, Malaysia, the Philippines, Saudi Arabia, Singapore, Taiwan, Thailand and the United Arab Emirates. The compan ...


References


Further reading

* * {{cite book , last1=Noble , first1=Safiya Umoja , date=2018 , title=Algorithms of Oppression: How Search Engines Reinforce Racism , isbn=9781479837243 , location=New York , publisher=New York University Press , title-link=Algorithms of Oppression Machine learning Information ethics Computing and society Philosophy of artificial intelligence Discrimination Bias