HOME

TheInfoList



OR:

In information science, profiling refers to the process of construction and application of user profiles generated by computerized
data analysis Data analysis is the process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Da ...
. This is the use of
algorithms In mathematics and computer science, an algorithm () is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for per ...
or other mathematical techniques that allow the discovery of patterns or
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
s in large quantities of data, aggregated in
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s. When these patterns or correlations are used to identify or represent people, they can be called ''profiles''. Other than a discussion of profiling ''technologies'' or ''population profiling'', the notion of profiling in this sense is not just about the construction of profiles, but also concerns the ''application'' of group profiles to individuals, e. g., in the cases of credit scoring,
price discrimination Price discrimination (differential pricing, equity pricing, preferential pricing, dual pricing, tiered pricing, and surveillance pricing) is a Microeconomics, microeconomic Pricing strategies, pricing strategy where identical or largely similar g ...
, or identification of security risks . Profiling is being used in fraud prevention,
ambient intelligence Ambient intelligence (AmI) refers to environments with electronic devices that are aware of and can recognize the presence of human beings and adapt accordingly. This concept encompasses various technologies in consumer electronics, telecommunic ...
, consumer analytics, and
surveillance Surveillance is the monitoring of behavior, many activities, or information for the purpose of information gathering, influencing, managing, or directing. This can include observation from a distance by means of electronic equipment, such as ...
.
Statistical method Statistics (from German: ', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social ...
s of profiling include Knowledge Discovery in Databases (KDD).


The profiling process

The technical process of profiling can be separated in several steps: * ''Preliminary grounding:'' The profiling process starts with a specification of the applicable problem domain and the identification of the goals of analysis. * ''
Data collection Data collection or data gathering is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. Data collection is a research com ...
:'' The target dataset or database for analysis is formed by selecting the relevant data in the light of existing domain knowledge and data understanding. * '' Data preparation:'' The data are preprocessed for removing noise and reducing complexity by eliminating attributes. * ''
Data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
:'' The data are analysed with the algorithm or heuristics developed to suit the data, model and goals. * ''Interpretation:'' The mined patterns are evaluated on their relevance and validity by specialists and/or professionals in the application domain (e.g. excluding spurious correlations). * ''Application:'' The constructed profiles are applied, e.g. to categories of persons, to test and fine-tune the algorithms. * ''Institutional decision:'' The institution decides what actions or policies to apply to groups or individuals whose data match a relevant profile. Data collection, preparation and mining all belong to the phase in which the profile is under construction. However, profiling also refers to the application of profiles, meaning the usage of profiles for the identification or categorization of groups or individual persons. As can be seen in step six (application), the process is circular. There is a feedback loop between the construction and the application of profiles. The interpretation of profiles can lead to the reiterant – possibly real-time – fine-tuning of specific previous steps in the profiling process. The application of profiles to people whose data were not used to construct the profile is based on data matching, which provides new data that allows for further adjustments. The process of profiling is both dynamic and adaptive. A good illustration of the dynamic and adaptive nature of profiling is the Cross-Industry Standard Process for Data Mining ( CRISP-DM).


Types of profiling practices

In order to clarify the nature of profiling technologies, some crucial distinctions have to be made between different types of profiling practices, apart from the distinction between the construction and the application of profiles. The main distinctions are those between bottom-up and top-down profiling (or supervised and unsupervised learning), and between individual and group profiles.


Supervised and unsupervised learning

Profiles can be classified according to the way they have been generated . On the one hand, profiles can be generated by testing a hypothesized correlation. This is called top-down profiling or
supervised learning In machine learning, supervised learning (SL) is a paradigm where a Statistical model, model is trained using input objects (e.g. a vector of predictor variables) and desired output values (also known as a ''supervisory signal''), which are often ...
. This is similar to the methodology of traditional scientific research in that it starts with a hypothesis and consists of testing its validity. The result of this type of profiling is the verification or refutation of the hypothesis. One could also speak of deductive profiling. On the other hand, profiles can be generated by exploring a data base, using the
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
process to detect patterns in the data base that were not previously hypothesized. In a way, this is a matter of generating hypothesis: finding correlations one did not expect or even think of. Once the patterns have been mined, they will enter the loop – described above – and will be tested with the use of new data. This is called
unsupervised learning Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Other frameworks in the spectrum of supervisions include weak- or semi-supervision, wh ...
. Two things are important with regard to this distinction. First, unsupervised learning algorithms seem to allow the construction of a new type of knowledge, not based on hypothesis developed by a researcher and not based on causal or motivational relations but exclusively based on stochastical correlations. Second, unsupervised learning algorithms thus seem to allow for an inductive type of knowledge construction that does not require theoretical justification or causal explanation . Some authors claim that if the application of profiles based on computerized stochastical pattern recognition 'works', i.e. allows for reliable predictions of future behaviours, the theoretical or causal explanation of these patterns does not matter anymore . However, the idea that 'blind' algorithms provide reliable information does not imply that the information is neutral. In the process of collecting and aggregating data into a database (the first three steps of the process of profile construction), translations are made from real-life events to
machine-readable data In communications and computing, a machine-readable medium (or computer-readable medium) is a medium capable of storing data in a format easily readable by a digital computer or a sensor. It contrasts with ''human-readable'' medium and data ...
. These data are then prepared and cleansed to allow for initial computability. Potential bias will have to be located at these points, as well as in the choice of algorithms that are developed. It is not possible to mine a database for all possible linear and non-linear correlations, meaning that the mathematical techniques developed to search for patterns will be determinate of the patterns that can be found. In the case of machine profiling, potential bias is not informed by common sense prejudice or what psychologists call stereotyping, but by the computer techniques employed in the initial steps of the process. These techniques are mostly invisible for those to whom profiles are applied (because their data match the relevant group profiles).


Individual and group profiles

Profiles must also be classified according to the kind of subject they refer to. This subject can either be an individual or a group of people. When a profile is constructed with the data of a single person, this is called individual profiling . This kind of profiling is used to discover the particular characteristics of a certain individual, to enable unique identification or the provision of personalized services. However, personalized servicing is most often also based on group profiling, which allows categorisation of a person as a certain type of person, based on the fact that her profile matches with a profile that has been constructed on the basis of massive amounts of data about massive numbers of other people. A group profile can refer to the result of data mining in data sets that refer to an existing community that considers itself as such, like a religious group, a tennis club, a university, a political party etc. In that case it can describe previously unknown patterns of behaviour or other characteristics of such a group (community). A group profile can also refer to a category of people that do not form a community, but are found to share previously unknown patterns of behaviour or other characteristics . In that case the group profile describes specific behaviours or other characteristics of a category of people, like for instance women with blue eyes and red hair, or adults with relatively short arms and legs. These categories may be found to correlate with health risks, earning capacity, mortality rates, credit risks, etc. If an individual profile is applied to the individual that it was mined from, then that is direct individual profiling. If a group profile is applied to an individual whose data match the profile, then that is indirect individual profiling, because the profile was generated using data of other people. Similarly, if a group profile is applied to the group that it was mined from, then that is direct group profiling . However, in as far as the application of a group profile to a group implies the application of the group profile to individual members of the group, it makes sense to speak of indirect group profiling, especially if the group profile is non-distributive.


Distributive and non-distributive profiling

Group profiles can also be divided in terms of their distributive character . A group profile is distributive when its properties apply equally to all the members of its group: all bachelors are unmarried, or all persons with a specific gene have 80% chance to contract a specific disease. A profile is non-distributive when the profile does not necessarily apply to all the members of the group: the group of persons with a specific postal code have an average earning capacity of XX, or the category of persons with blue eyes has an average chance of 37% to contract a specific disease. Note that in this case the chance of an individual to have a particular earning capacity or to contract the specific disease will depend on other factors, e.g. sex, age, background of parents, previous health, education. It should be obvious that, apart from tautological profiles like that of bachelors, most group profiles generated by means of computer techniques are non-distributive. This has far-reaching implications for the accuracy of indirect individual profiling based on data matching with non-distributive group profiles. Quite apart from the fact that the application of accurate profiles may be unfair or cause undue stigmatisation, most group profiles will not be accurate.


Applications

In the financial sector, institutions use profiling technologies for
fraud prevention In law, fraud is intentional deception to deprive a victim of a legal right or to gain from a victim unlawfully or unfairly. Fraud can violate civil law (e.g., a fraud victim may sue the fraud perpetrator to avoid the fraud or recover mone ...
and credit scoring. Banks want to minimize the risks in giving credit to their customers. On the basis of the extensive group, profiling customers are assigned a certain scoring value that indicates their creditworthiness. Financial institutions like banks and insurance companies also use group profiling to detect fraud or
money-laundering Money laundering is the process of illegally concealing the origin of money obtained from illicit activities (often known as dirty money) such as drug trafficking, sex work, terrorism, corruption, and embezzlement, and converting the funds into ...
. Databases with transactions are searched with algorithms to find behaviors that deviate from the standard, indicating potentially suspicious transactions. In the context of employment, profiles can be of use for tracking employees by monitoring their online behavior, for the detection of fraud by them, and for the deployment of human resources by pooling and ranking their skills. Profiling can also be used to support people at work, and also for learning, by intervening in the design of
adaptive hypermedia Adaptive hypermedia (AH) uses hypermedia which is adaptive according to a '' user model''. In contrast to regular hypermedia, where all users are offered the same set of hyperlinks, adaptive hypermedia (AH) tailors what the user is offered based on ...
systems personalizing the interaction. For instance, this can be useful for supporting the management of attention . In
forensic science Forensic science combines principles of law and science to investigate criminal activity. Through crime scene investigations and laboratory analysis, forensic scientists are able to link suspects to evidence. An example is determining the time and ...
, the possibility exists of linking different databases of cases and suspects and mining these for common patterns. This could be used for solving existing cases or for the purpose of establishing risk profiles of potential suspects .


Consumer profiling

Consumer profiling is a form of
customer analytics Customer analytics is a process by which data from customer behavior is used to help make key business decisions via market segmentation and predictive analytics. This information is used by businesses for direct marketing, site selection, and ...
, where customer data is used to make decisions on product promotion, the
pricing Pricing is the Business process, process whereby a business sets and displays the price at which it will sell its products and services and may be part of the business's marketing plan. In setting prices, the business will take into account the ...
of products, as well as personalized
advertising Advertising is the practice and techniques employed to bring attention to a Product (business), product or Service (economics), service. Advertising aims to present a product or service in terms of utility, advantages, and qualities of int ...
. When the aim is to find the most profitable customer segment, consumer analytics draws on
demographic data Demography () is the statistical study of human populations: their size, composition (e.g., ethnic group, age), and how they change through the interplay of fertility (births), mortality (deaths), and migration. Demographic analysis examine ...
, data on
consumer behavior Consumer behaviour is the study of individuals, groups, or organisations and all activities associated with the purchase, use and disposal of goods and services. It encompasses how the consumer's emotions, attitudes, and preferences affe ...
, data on the products purchased, payment method, and surveys to establish consumer profiles. To establish predictive models on the basis of existing
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s, the Knowledge Discovery in Databases (KDD) statistical method is used. KDD groups similar customer data to predict future consumer behavior. Other methods of predicting consumer behaviour are
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
and
pattern recognition Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess PR capabilities but their p ...
. Consumer profiles describe customers based on a set of attributes and typically consumers are grouped according to
income Income is the consumption and saving opportunity gained by an entity within a specified timeframe, which is generally expressed in monetary terms. Income is difficult to define conceptually and the definition may be different across fields. F ...
,
living standard Standard of living is the level of income, comforts and services available to an individual, community or society. A contributing factor to an individual's quality of life, standard of living is generally concerned with objective metrics outside ...
,
age Age or AGE may refer to: Time and its effects * Age, the amount of time someone has been alive or something has existed ** East Asian age reckoning, an Asian system of marking age starting at 1 * Ageing or aging, the process of becoming older ...
and location. Consumer profiles may also include behavioural attributes that assess a customer's motivation in the
buyer decision process As part of consumer behavior, the buying decision process is the decision-making process used by consumers regarding the market transactions before, during, and after the purchase of a Good (economics), good or Service (economics), service. It ...
. Well known examples of consumer profiles are
Experian Experian plc is a multinational corporation, multinational data broker and consumer credit reporting company headquartered in Dublin, Ireland. Experian collects and aggregates information on more than 1 billion people and businesses including ...
's
Mosaic A mosaic () is a pattern or image made of small regular or irregular pieces of colored stone, glass or ceramic, held in place by plaster/Mortar (masonry), mortar, and covering a surface. Mosaics are often used as floor and wall decoration, and ...
geodemographic classification of households,
CACI CACI International Inc. (originally California Analysis Center, Inc., then Consolidated Analysis Center, Inc.) is an American multinational corporation, multinational professional services and information technology company headquartered in Nor ...
's
Acorn The acorn is the nut (fruit), nut of the oaks and their close relatives (genera ''Quercus'', ''Notholithocarpus'' and ''Lithocarpus'', in the family Fagaceae). It usually contains a seedling surrounded by two cotyledons (seedling leaves), en ...
, and Acxiom's Personicx.


Ambient intelligence

In a
built environment The term built environment refers to human-made conditions and is often used in architecture, landscape architecture, urban planning, public health, sociology, and anthropology, among others. These curated spaces provide the setting for human ac ...
with
ambient intelligence Ambient intelligence (AmI) refers to environments with electronic devices that are aware of and can recognize the presence of human beings and adapt accordingly. This concept encompasses various technologies in consumer electronics, telecommunic ...
everyday objects have built-in
sensor A sensor is often defined as a device that receives and responds to a signal or stimulus. The stimulus is the quantity, property, or condition that is sensed and converted into electrical signal. In the broadest definition, a sensor is a devi ...
s and
embedded system An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is e ...
s that allow objects to recognise and respond to the presence and needs of individuals. Ambient intelligence relies on automated profiling and
human–computer interaction Human–computer interaction (HCI) is the process through which people operate and engage with computer systems. Research in HCI covers the design and the use of computer technology, which focuses on the interfaces between people (users) and comp ...
designs. Sensors monitor an individual's action and behaviours, therefore generating, collecting, analysing, processing and storing
personal data Personal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person. The abbreviation PII is widely used in the United States, but the phrase it abbreviates has fou ...
. Early examples of
consumer electronics Consumer electronics, also known as home electronics, are electronic devices intended for everyday household use. Consumer electronics include those used for entertainment, Communication, communications, and recreation. Historically, these prod ...
with ambient intelligence include
mobile app A mobile application or app is a computer program or software application designed to run on a mobile device such as a smartphone, phone, tablet computer, tablet, or smartwatch, watch. Mobile applications often stand in contrast to desktop appli ...
s,
augmented reality Augmented reality (AR), also known as mixed reality (MR), is a technology that overlays real-time 3D computer graphics, 3D-rendered computer graphics onto a portion of the real world through a display, such as a handheld device or head-mounted ...
and
location-based service Location-based service (LBS) is a general term denoting software service (economics), services which use geographic data and information to provide services or information to users. LBS can be used in a variety of contexts, such as health, indoor ...
.


Risks and issues

Profiling technologies have raised a host of ethical, legal and other issues including
privacy Privacy (, ) is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. The domain of privacy partially overlaps with security, which can include the concepts of a ...
,
equality Equality generally refers to the fact of being equal, of having the same value. In specific contexts, equality may refer to: Society * Egalitarianism, a trend of thought that favors equality for all people ** Political egalitarianism, in which ...
,
due process Due process of law is application by the state of all legal rules and principles pertaining to a case so all legal rights that are owed to a person are respected. Due process balances the power of law of the land and protects the individual p ...
,
security Security is protection from, or resilience against, potential harm (or other unwanted coercion). Beneficiaries (technically referents) of security may be persons and social groups, objects and institutions, ecosystems, or any other entity or ...
and liability. Numerous authors have warned against the affordances of a new technological infrastructure that could emerge on the basis of semi-autonomic profiling technologies . Privacy is one of the principal issues raised. Profiling technologies make possible a far-reaching monitoring of an individual's behaviour and preferences. Profiles may reveal personal or private information about individuals that they might not even be aware of themselves . Profiling technologies are by their very nature discriminatory tools. They allow unparalleled kinds of social sorting and segmentation which could have unfair effects. The people that are profiled may have to pay higher prices, they could miss out on important offers or opportunities, and they may run increased risks because catering to their needs is less profitable . In most cases they will not be aware of this, since profiling practices are mostly invisible and the profiles themselves are often protected by intellectual property or trade secret. This poses a threat to the equality of and solidarity of citizens. On a larger scale, it might cause the segmentation of society. One of the problems underlying potential violations of privacy and non-discrimination is that the process of profiling is more often than not invisible for those that are being profiled. This creates difficulties in that it becomes hard, if not impossible, to contest the application of a particular group profile. This disturbs principles of due process: if a person has no access to information on the basis of which they are withheld benefits or attributed certain risks, they cannot contest the way they are being treated . Profiles can be used against people when they end up in the hands of people who are not entitled to access or use the information. An important issue related to these breaches of security is
identity theft Identity theft, identity piracy or identity infringement occurs when someone uses another's personal identifying information, like their name, identifying number, or credit card number, without their permission, to commit fraud or other crimes. ...
. When the application of profiles causes harm, the liability for this harm has to be determined who is to be held accountable. Is the software programmer, the profiling service provider, or the profiled user to be held accountable? This issue of liability is especially complex in the case the application and decisions on profiles have also become automated like in
Autonomic Computing Autonomic computing (AC) is distributed computing resources with self-management (computer science), self-managing characteristics, adapting to unpredictable changes while hiding intrinsic complexity to operators and users. Initiated by IBM in 2001 ...
or
ambient intelligence Ambient intelligence (AmI) refers to environments with electronic devices that are aware of and can recognize the presence of human beings and adapt accordingly. This concept encompasses various technologies in consumer electronics, telecommunic ...
decisions of automated decisions based on profiling.


See also

*
Automated decision-making Automated decision-making (ADM) is the use of data, machines and algorithms to make decisions in a range of contexts, including public administration, business, health, education, law, employment, transport, media and entertainment, with varying d ...
*
Behavioral targeting Targeted advertising or data-driven marketing is a form of advertising, including online advertising, that is directed towards an audience with certain traits, based on the product or person the advertiser is promoting. These traits can either ...
*
Data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
* Demographic profiling *
Digital identity A digital identity is data stored on Computer, computer systems relating to an individual, organization, application, or device. For individuals, it involves the collection of personal data that is essential for facilitating automated access to ...
*
Digital traces Digital footprint or digital shadow refers to one's unique set of traceable digital activities, actions, contributions, and communications manifested on the Internet or digital devices. Digital footprints can be classified as either passive o ...
* Forensic profiling * Identification (information) * Identity * Labelling *
Privacy Privacy (, ) is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. The domain of privacy partially overlaps with security, which can include the concepts of a ...
* Profiling *
Offender profiling Offender profiling, also known as criminal profiling, is an investigative strategy used by law enforcement agencies to identify likely suspects and has been used by Detective, investigators to link cases that may have been committed by the same ...
* Social profiling *
Stereotype In social psychology, a stereotype is a generalization, generalized belief about a particular category of people. It is an expectation that people might have about every person of a particular group. The type of expectation can vary; it can ...
* User modeling *
User profile A user profile is a collection of settings and information associated with a user. It contains critical information that is used to identify an individual, such as their name, age, portrait photograph and individual characteristics such as kn ...


References

* * * * * * * * * * * * * * * * * * * Notes and other references {{DEFAULTSORT:Profiling Practices Identity management Data mining