HOME

TheInfoList



OR:

In
network theory Network theory is the study of graphs as a representation of either symmetric relations or asymmetric relations between discrete objects. In computer science and network science, network theory is a part of graph theory: a network can be de ...
, link analysis is a
data-analysis Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, enc ...
technique used to evaluate relationships (connections) between nodes. Relationships may be identified among various types of nodes (objects), including
organization An organization or organisation (Commonwealth English; see spelling differences), is an entity—such as a company, an institution, or an association—comprising one or more people and having a particular purpose. The word is derived f ...
s,
people A person ( : people) is a being that has certain capacities or attributes such as reason, morality, consciousness or self-consciousness, and being a part of a culturally established form of social relations such as kinship, ownership of prope ...
and transactions. Link analysis has been used for investigation of criminal activity ( fraud detection,
counterterrorism Counterterrorism (also spelled counter-terrorism), also known as anti-terrorism, incorporates the practices, military tactics, techniques, and strategies that governments, law enforcement, business, and intelligence agencies use to combat or ...
, and
intelligence Intelligence has been defined in many ways: the capacity for abstraction, logic, understanding, self-awareness, learning, emotional knowledge, reasoning, planning, creativity, critical thinking, and problem-solving. More generally, it can ...
), computer security analysis,
search engine optimization Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic (known as "natural" or "organic" results) rather than dire ...
,
market research Market research is an organized effort to gather information about target markets and customers: know about them, starting with who they are. It is an important component of business strategy and a major factor in maintaining competitiveness. Ma ...
,
medical research Medical research (or biomedical research), also known as experimental medicine, encompasses a wide array of research, extending from " basic research" (also called ''bench science'' or ''bench research''), – involving fundamental scienti ...
, and art.


Knowledge discovery

Knowledge discovery is an
iterative Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
and
interactive Across the many fields concerned with interactivity, including information science, computer science, human-computer interaction, communication, and industrial design, there is little agreement over the meaning of the term "interactivity", but mo ...
process used to identify, analyze and visualize patterns in data. Network analysis, link analysis and
social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...
are all methods of knowledge discovery, each a corresponding subset of the prior method. Most knowledge discovery methods follow these steps (at the highest level): #
Data processing Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of '' information processing'', which is the modification (processing) of information in any manner detectable by ...
# Transformation #
Analysis Analysis ( : analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (3 ...
#
Visualization Visualization or visualisation may refer to: * Visualization (graphics), the physical or imagining creation of images, diagrams, or animations to communicate a message * Data visualization, the graphic representation of data * Information visuali ...
Data gathering and processing requires access to data and has several inherent issues, including
information overload Information overload (also known as infobesity, infoxication, information anxiety, and information explosion) is the difficulty in understanding an issue and effectively making decisions when one has too much information (TMI) about that issue, ...
and data errors. Once data is collected, it will need to be transformed into a format that can be effectively used by both human and computer analyzers. Manual or computer-generated visualizations tools may be mapped from the data, including network charts. Several algorithms exist to help with analysis of data –
Dijkstra’s algorithm Dijkstra's algorithm ( ) is an algorithm for finding the shortest paths between nodes in a graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years la ...
,
breadth-first search Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next de ...
, and
depth-first search Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible a ...
. Link analysis focuses on analysis of relationships among nodes through visualization methods ( network charts, association matrix). Here is an example of the relationships that may be mapped for crime investigations:Krebs, V. E. 2001
Mapping networks of terrorist cells
, Connections 24, 43–52.
Link analysis is used for 3 primary purposes:Link Analysis Workbench
Air Force Research Laboratory Information Directorate, Rome Research Site, Rome, New York, September 2004.
# Find matches in data for known patterns of interest; # Find anomalies where known patterns are violated; # Discover new patterns of interest (social network analysis, data mining).


History

Klerks categorized link analysis tools into 3 generations. The first generation was introduced in 1975 as the Anacpapa Chart of Harper and Harris. This method requires that a domain expert review data files, identify associations by constructing an association matrix, create a link chart for visualization and finally analyze the network chart to identify patterns of interest. This method requires extensive domain knowledge and is extremely time-consuming when reviewing vast amounts of data. In addition to the association matrix, the activities matrix can be used to produce actionable information, which has practical value and use to law-enforcement. The activities matrix, as the term might imply, centers on the actions and activities of people with respect to locations. Whereas the association matrix focuses on the relationships between people, organizations, and/or properties. The distinction between these two types of matrices, while minor, is nonetheless significant in terms of the output of the analysis completed or rendered. Second generation tools consist of automatic graphics-based analysis tools such as IBM i2 Analyst’s Notebook, Netmap, ClueMaker and Watson. These tools offer the ability to automate the construction and updates of the link chart once an association matrix is manually created, however, analysis of the resulting charts and graphs still requires an expert with extensive domain knowledge. The third generation of link-analysis tools like DataWalk allow the automatic visualization of linkages between elements in a data set, that can then serve as the canvas for further exploration or manual updates.


Applications

* FBI Violent Criminal Apprehension Program (ViCAP) * Iowa State Sex Crimes Analysis System * Minnesota State Sex Crimes Analysis System (MIN/SCAP) * Washington State Homicide Investigation Tracking System (HITS) * New York State Homicide Investigation & Lead Tracking (HALT) * New Jersey Homicide Evaluation & Assessment Tracking (HEAT) * Pennsylvania State ATAC Program. * Violent Crime Linkage Analysis System (ViCLAS)


Issues with link analysis


Information overload

With the vast amounts of data and information that are stored electronically, users are confronted with multiple unrelated sources of information available for analysis. Data analysis techniques are required to make effective and efficient use of the data. Palshikar classifies data analysis techniques into two categories – (
statistical Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...
models A model is an informative representation of an object, person or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a measure. Models c ...
,
time-series analysis In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
, clustering and
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
, matching algorithms to detect anomalies) and
artificial intelligence (AI) Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech rec ...
techniques (data mining, expert systems,
pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics ...
, machine learning techniques,
neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
s). Bolton & Hand define statistical data analysis as either supervised or unsupervised methods. Supervised learning methods require that rules are defined within the system to establish what is expected or unexpected behavior. Unsupervised learning methods review data in comparison to the norm and detect statistical outliers. Supervised learning methods are limited in the scenarios that can be handled as this method requires that training rules are established based on previous patterns. Unsupervised learning methods can provide detection of broader issues, however, may result in a higher false-positive ratio if the behavioral norm is not well established or understood. Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”. Sparrow highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis. Once data is transformed into a usable format, open texture and cross referencing issues may arise.
Open texture Open texture is a term in the philosophy of Friedrich Waismann, first introduced in his paper ''Verifiability'' to refer to the universal possibility of vagueness in empirical statements. It is an application of some of the ideas of posited by L ...
was defined by Waismann as the unavoidable uncertainty in meaning when empirical terms are used in different contexts. Uncertainty in meaning of terms presents problems when attempting to search and cross reference data from multiple sources. The primary method for resolving data analysis issues is reliance on
domain knowledge Domain knowledge is knowledge of a specific, specialized discipline or field, in contrast to general (or domain-independent) knowledge. The term is often used in reference to a more general discipline—for example, in describing a software engin ...
from an expert. This is a very time-consuming and costly method of conducting link analysis and has inherent problems of its own. McGrath et al. conclude that the layout and presentation of a network diagram have a significant impact on the user’s “perceptions of the existence of groups in networks”. Even using domain experts may result in differing conclusions as analysis may be subjective.


Prosecution vs. crime prevention

Link analysis techniques have primarily been used for prosecution, as it is far easier to review historical data for patterns than it is to attempt to predict future actions. Krebs demonstrated the use of an association matrix and link chart of the terrorist network associated with the 19 hijackers responsible for the
September 11th attacks The September 11 attacks, commonly known as 9/11, were four coordinated suicide terrorist attacks carried out by al-Qaeda against the United States on Tuesday, September 11, 2001. That morning, nineteen terrorists hijacked four commercia ...
by mapping publicly available details made available following the attacks. Even with the advantages of hindsight and publicly available information on people, places and transactions, it is clear that there is missing data. Alternatively, Picarelli argued that use of link analysis techniques could have been used to identify and potentially prevent illicit activities within the
Aum Shinrikyo , formerly , is a Japanese doomsday cult founded by Shoko Asahara in 1987. It carried out the deadly Tokyo subway sarin attack in 1995 and was found to have been responsible for the Matsumoto sarin attack the previous year. The group says ...
network. “We must be careful of ‘guilt by association’. Being linked to a terrorist does not prove guilt – but it does invite investigation.” Balancing the legal concepts of
probable cause In United States criminal law, probable cause is the standard by which police authorities have reason to obtain a warrant for the arrest of a suspected criminal or the issuing of a search warrant. There is no universally accepted definition o ...
,
right to privacy The right to privacy is an element of various legal traditions that intends to restrain governmental and private actions that threaten the privacy of individuals. Over 150 national constitutions mention the right to privacy. On 10 December 194 ...
and
freedom of association Freedom of association encompasses both an individual's right to join or leave groups voluntarily, the right of the group to take collective action to pursue the interests of its members, and the right of an association to accept or decline mem ...
become challenging when reviewing potentially sensitive data with the objective to prevent crime or illegal activity that has not yet occurred.


Proposed solutions

There are four categories of proposed link analysis solutions:Schroeder et al., Automated Criminal Link Analysis Based on Domain Knowledge, Journal of the American Society for Information Science and Technology, 58:6 (842), 2007. # Heuristic-based # Template-based # Similarity-based #
Statistical Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...
Heuristic-based tools utilize decision rules that are distilled from expert knowledge using structured data. Template-based tools employ Natural Language Processing (NLP) to extract details from
unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, num ...
that are matched to pre-defined templates. Similarity-based approaches use weighted scoring to compare attributes and identify potential links. Statistical approaches identify potential links based on lexical statistics.


CrimeNet explorer

J.J. Xu and H. Chen propose a framework for automated network analysis and visualization called CrimeNet Explorer.Xu, J.J. & Chen, H., CrimeNet Explorer: A Framework for Criminal Network Knowledge Discovery, ACM Transactions on Information Systems, 23(2), April 2005, pp. 201-226. This framework includes the following elements: * Network Creation through a concept space approach that uses “ co-occurrence weight to measure the frequency with which two words or phrases appear in the same document. The more frequently two words or phrases appear together, the more likely it will be that they are related”. * Network Partition using “hierarchical clustering to partition a network into subgroups based on relational strength”. * Structural Analysis through “three centrality measures (degree, betweenness, and closeness) to identify central members in a given subgroup. CrimeNet Explorer employed Dijkstra’s shortest-path algorithm to calculate the betweenness and closeness from a single node to all other nodes in the subgroup. * Network Visualization using Torgerson’s metric multidimensional scaling (MDS) algorithm.


References


External links

*
Link Analysis and Crime - An Examination

Elink Schuurman MW, Srisaenpang S, Pinitsoontorn S, Bijleveld I, Vaeteewoothacharn K, Methapat C., The rapid village survey in tuberculosis control, Tuber Lung Dis. 1996 Dec;77(6):549-54.

Gunhee, K., Faloutsos, C, Hebert, M, Unsupervised Modeling of Object Categories Using Link Analysis Techniques.

McGehee, R., Intelligence Report.

Ressler, S., Social Network Analysis as an Approach to Combat Terrorism: Past, Present and Future Research.



IBM i2 Analyst's Notebook Premium
* {{cite journal , last1 = Silberschatz , first1 = A. , title = What Makes Patterns Interesting in Knowledge Discovery Systems , journal = IEEE Transactions on Knowledge and Data Engineering , volume = 8 , issue = 6 , pages = 970–974 , citeseerx = 10.1.1.53.2780 , year = 1996 , doi = 10.1109/69.553165 , s2cid = 11430349


Workshop on Link Analysis: Dynamics and Static of Large Networks (LinkKDD2006) August 20, 2006

ClueMaker

Data Walk
Network theory