HOME

TheInfoList



OR:

Software mining is an application of
knowledge discovery Knowledge extraction is the creation of Knowledge representation and reasoning, knowledge from structured (relational databases, XML) and unstructured (text (literary theory), text, documents, images) sources. The resulting knowledge needs to be in ...
in the area of
software modernization Legacy modernization, also known as software modernization or platform modernization, refers to the conversion, rewriting or porting of a legacy system to modern computer programming languages, architectures (e.g. microservices), software libraries ...
which involves understanding existing software artifacts. This process is related to a concept of
reverse engineering Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accompli ...
. Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary. An
entity relationship An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not. It need not be of material existence. In particular, abstractions and legal fictions are usually re ...
is a frequent format of representing knowledge obtained from existing software.
Object Management Group The Object Management Group (OMG) is a computer industry standardization, standards consortium. OMG Task Forces develop enterprise integration standards for a range of technologies. Business activities The goal of the OMG was a common portabl ...
(OMG) developed specification
Knowledge Discovery Metamodel Knowledge Discovery Metamodel (KDM) is a publicly available specification from the Object Management Group (OMG). KDM is a common intermediate representation for existing software systems and their operating environments, that defines common meta ...
(KDM) which defines an
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
for software assets and their relationships for the purpose of performing knowledge discovery of existing code.


Software mining and data mining

Software mining is closely related to data mining, since existing software artifacts contain enormous business value, key for the evolution of software systems. Knowledge discovery from software systems addresses structure, behavior as well as the data processed by the software system. Instead of mining individual
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
s, software mining focuses on
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
, such as database schemas. OMG
Knowledge Discovery Metamodel Knowledge Discovery Metamodel (KDM) is a publicly available specification from the Object Management Group (OMG). KDM is a common intermediate representation for existing software systems and their operating environments, that defines common meta ...
provides an integrated representation to capturing application
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
as part of a holistic existing system metamodel. Another OMG specification, the Common Warehouse Metamodel focuses entirely on mining enterprise
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
.


Text-Mining Software Tools

Text mining Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
software tools enable easy handling of text documents for the purpose of data analysis including automatic model generation and
document classification Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") ...
,
document clustering Document clustering (or text clustering) is the application of cluster analysis to textual documents. It has applications in automatic document organization, topic extraction and fast information retrieval or filtering. Overview Document clusteri ...
, document visualization, dealing with Web documents, and crawling the Web.


Levels of software mining

''Knowledge discovery in software'' is related to a concept of
reverse engineering Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accompli ...
. Software mining addresses structure, behavior as well as the data processed by the software system. Mining software systems may happen at various ''levels'': * program level (individual statements and variables) *
design pattern A design pattern is the re-usable form of a solution to a design problem. The idea was introduced by the architect Christopher Alexander and has been adapted for various other disciplines, particularly software engineering. The " Gang of Four" b ...
level *
call graph A call graph (also known as a call multigraph) is a control-flow graph, which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge ''(f, g)'' indicates that procedure ''f'' cal ...
level (individual procedures and their relationships) * architectural level (subsystems and their interfaces) * data level (individual columns and attributes of data stores) * application level (key data items and their flow through the applications) * business level (domain concepts, business rules and their implementation in code)


Forms of representing the results of Software Mining

*
data model A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be co ...
*
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
* metamodels *
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
*
Knowledge representation Knowledge representation and reasoning (KRR, KR&R, KR²) is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks such as diagnosing a medic ...
*
business rule A business rule defines or constrains some aspect of business. It may be expressed to specify an action to be taken when certain conditions are true or may be phrased so it can only resolve to either true or false. Business rules are intended to ass ...
*
Knowledge Discovery Metamodel Knowledge Discovery Metamodel (KDM) is a publicly available specification from the Object Management Group (OMG). KDM is a common intermediate representation for existing software systems and their operating environments, that defines common meta ...
(KDM) *
Business Process Modeling Notation Business Process Model and Notation (BPMN) is a graphical representation for specifying business processes in a business process model. Originally developed by the Business Process Management Initiative (BPMI), BPMN has been maintained by the Ob ...
(BPMN) *
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
*
Resource Description Framework The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of ...
(RDF) *
abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
(AST) *
software metric In software engineering and development, a software metric is a standard of measure of a degree to which a software system or process possesses some property. Even if a metric is not a measurement (metrics are functions, while measurements are t ...
s *
graphical user interface The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inste ...
s


See also

* Mining Software Repositories


References

{{reflist Static program analysis tools Data mining