The dominance-based rough set approach (DRSA) is an extension of
rough set theory for
multi-criteria decision analysis
Multiple-criteria decision-making (MCDM) or multiple-criteria decision analysis (MCDA) is a sub-discipline of operations research that explicitly evaluates multiple conflicting criteria in decision making (both in daily life and in settings ...
(MCDA), introduced by Greco, Matarazzo and Słowiński.
[
Greco, S., Matarazzo, B., Słowiński, R.: Rough sets theory for multi-criteria decision analysis. European Journal of Operational Research, 129, 1 (2001) 1–47] The main change compared to the classical
rough sets
In computer science, a rough set, first described by Polish computer scientist Zdzisław I. Pawlak, is a formal approximation of a crisp set (i.e., conventional set) in terms of a pair of sets which give the ''lower'' and the ''upper'' approximati ...
is the substitution for the indiscernibility relation by a dominance relation, which permits one to deal with inconsistencies typical to consideration of criteria and preference-ordered decision classes.
Multicriteria classification (sorting)
Multicriteria classification In multiple criteria decision aiding (MCDA), multicriteria classification (or sorting) involves problems where a finite set of alternative actions should be assigned into a predefined set of preferentially ordered categories (classes). For example, ...
(
sorting
Sorting refers to ordering data in an increasing or decreasing manner according to some linear relationship among the data items.
# ordering: arranging items in a sequence ordered by some criterion;
# categorizing: grouping items with similar pro ...
) is one of the problems considered within
MCDA
Multiple-criteria decision-making (MCDM) or multiple-criteria decision analysis (MCDA) is a sub-discipline of operations research that explicitly evaluates multiple conflicting criteria in decision making (both in daily life and in settings ...
and can be stated as follows: given a set of objects evaluated by a set of
criteria (attributes with preference-order domains), assign these objects to some pre-defined and preference-ordered decision classes, such that each object is assigned to exactly one class. Due to the preference ordering, improvement of evaluations of an object on the criteria should not worsen its class assignment. The sorting problem is very similar to the problem of
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
, however, in the latter, the objects are evaluated by regular attributes and the decision classes are not necessarily preference ordered. The problem of multicriteria classification is also referred to as
ordinal classification problem with monotonicity constraints
Ordinal may refer to:
* Ordinal data, a statistical data type consisting of numerical scores that exist on an arbitrary numerical scale
* Ordinal date, a simple form of expressing a date using only the year and the day number within that year
* ...
and often appears in real-life application when
ordinal and
monotone
Monotone refers to a sound, for example music or speech, that has a single unvaried tone. See: monophony.
Monotone or monotonicity may also refer to:
In economics
*Monotone preferences, a property of a consumer's preference ordering.
*Monotonic ...
properties follow from the domain knowledge about the problem.
As an illustrative example, consider the problem of evaluation in a high school. The director of the school wants to assign students (''objects'') to three classes: ''bad'', ''medium'' and ''good'' (notice that class ''good'' is preferred to ''medium'' and ''medium'' is preferred to ''bad''). Each student is described by three criteria: level in Physics, Mathematics and Literature, each taking one of three possible values ''bad'', ''medium'' and ''good''. Criteria are preference-ordered and improving the level from one of the subjects should not result in worse global evaluation (class).
As a more serious example, consider classification of bank clients, from the viewpoint of bankruptcy risk, into classes ''safe'' and ''risky''. This may involve such characteristics as "
return on equity
The return on equity (ROE) is a measure of the profitability of a business in relation to the equity. Because shareholder's equity can be calculated by taking all assets and subtracting all liabilities, ROE can also be thought of as a return on '' ...
(ROE)", "
return on investment
Return on investment (ROI) or return on costs (ROC) is a ratio between net income (over a period) and investment (costs resulting from an investment of some resources at a point in time). A high ROI means the investment's gains compare favourably ...
(ROI)" and "
return on sales In business, operating margin—also known as operating income margin, operating profit margin, EBIT margin and return on sales (ROS)—is the ratio of operating income ("operating profit" in the UK) to net sales, usually expressed in percent.
: ...
(ROS)". The domains of these attributes are not simply ordered but involve a preference order since, from the viewpoint of bank managers, greater values of ROE, ROI or ROS are better for clients being analysed for bankruptcy risk . Thus, these attributes are criteria. Neglecting this information in
knowledge discovery
Knowledge extraction is the creation of Knowledge representation and reasoning, knowledge from structured (relational databases, XML) and unstructured (text (literary theory), text, documents, images) sources. The resulting knowledge needs to be in ...
may lead to wrong conclusions.
Data representation
Decision table
In DRSA, data are often presented using a particular form of
decision table
Decision tables are a concise visual representation for specifying which actions to perform depending on given conditions. They are algorithms whose output is a set of actions. The information expressed in decision tables could also be represented ...
. Formally, a DRSA decision table is a 4-tuple
, where
is a finite set of objects,
is a finite set of criteria,
where
is the domain of the criterion
and
is an ''information function'' such that
for every
. The set
is divided into ''condition criteria'' (set
) and the ''decision criterion'' (''class'')
. Notice, that
is an evaluation of object
on criterion
, while
is the class assignment (decision value) of the object. An example of decision table is shown in Table 1 below.
Outranking relation
It is assumed that the domain of a criterion
is completely
preorder
In mathematics, especially in order theory, a preorder or quasiorder is a binary relation that is reflexive and transitive. Preorders are more general than equivalence relations and (non-strict) partial orders, both of which are special c ...
ed by an
outranking relation ;
means that
is at least as good as (outranks)
with respect to the criterion
. Without loss of generality, we assume that the domain of
is a subset of
reals,
, and that the outranking relation is a simple order between real numbers
such that the following relation holds:
. This relation is straightforward for gain-type ("the more, the better") criterion, e.g. ''company profit''. For cost-type ("the less, the better") criterion, e.g. ''product price'', this relation can be satisfied by negating the values from
.
Decision classes and class unions
Let
. The domain of decision criterion,
consist of
elements (without loss of generality we assume
) and induces a partition of
into
classes
, where
. Each object
is assigned to one and only one class
. The classes are preference-ordered according to an increasing order of class indices, i.e. for all
such that
, the objects from
are strictly preferred to the objects from
. For this reason, we can consider the upward and downward unions of classes, defined respectively, as:
:
Main concepts
Dominance
We say that
dominates
with respect to
, denoted by
, if
is better than
on every criterion from
,
. For each
, the dominance relation
is
reflexive and
transitive, i.e. it is a
partial pre-order. Given
and
, let
:
:
represent ''P''-dominating set and ''P''-dominated set with respect to
, respectively.
Rough approximations
The key idea of the
rough set
In computer science, a rough set, first described by Polish computer scientist Zdzisław I. Pawlak, is a formal approximation of a crisp set (i.e., conventional set) in terms of a pair of sets which give the ''lower'' and the ''upper'' approximati ...
philosophy is approximation of one knowledge by another knowledge. In DRSA, the knowledge being approximated is a collection of upward and downward unions of decision classes and the "granules of knowledge" used for approximation are ''P''-dominating and ''P''-dominated sets.
The ''P''-lower and the ''P''-upper approximation of
with respect to
, denoted as
and
, respectively, are defined as:
:
:
Analogously, the ''P''-lower and the ''P''-upper approximation of
with respect to
, denoted as
and
, respectively, are defined as:
:
:
Lower approximations group the objects which ''certainly'' belong to class union
(respectively
). This certainty comes from the fact, that object
belongs to the lower approximation
(respectively
), if no other object in
contradicts this claim, i.e. every object
which ''P''-dominates
, also belong to the class union
(respectively
). Upper approximations group the objects which ''could belong'' to
(respectively
), since object
belongs to the upper approximation
(respectively
), if there exist another object
''P''-dominated by
from class union
(respectively
).
The ''P''-lower and ''P''-upper approximations defined as above satisfy the following properties for all
and for any
:
:
:
The ''P''-boundaries (''P-doubtful regions'') of
and
are defined as:
:
:
Quality of approximation and reducts
The ratio
:
defines the quality of approximation of the partition
into classes by means of the set of criteria
. This ratio express the relation between all the ''P''-correctly classified objects and all the objects in the table.
Every minimal subset
such that
is called a
reduct
In universal algebra and in model theory, a reduct of an algebraic structure is obtained by omitting some of the operation (mathematics), operations and relation (mathematics), relations of that structure. The opposite of "reduct" is "expansion. ...
of
and is denoted by
. A decision table may have more than one reduct. The intersection of all reducts is known as the ''core''.
Decision rules
On the basis of the approximations obtained by means of the dominance relations, it is possible to induce a generalized description of the preferential information contained in the decision table, in terms of
decision rules
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains condit ...
. The decision rules are expressions of the form ''if''
ondition''then''
onsequent that represent a form of dependency between condition criteria and decision criteria. Procedures for generating decision rules from a decision table use an inductive learning principle. We can distinguish three types of rules: certain, possible and approximate. Certain rules are generated from lower approximations of unions of classes; possible rules are generated from upper approximations of unions of classes and approximate rules are generated from boundary regions.
Certain rules has the following form:
:
if
and
and
then
:
if
and
and
then
Possible rules has a similar syntax, however the ''consequent'' part of the rule has the form:
''could belong to''
or the form:
''could belong to''
.
Finally, approximate rules has the syntax:
:
if
and
and
and
and
and
then
The certain, possible and approximate rules represent certain, possible and ambiguous knowledge extracted from the decision table.
Each decision rule should be minimal. Since a decision rule is an implication, by a minimal decision rule we understand such an implication that there is no other implication with an antecedent of at least the same weakness (in other words, rule using a subset of elementary conditions or/and weaker elementary conditions) and a consequent of at least the same strength (in other words, rule assigning objects to the same union or sub-union of classes).
A set of decision rules is ''complete'' if it is able to cover all objects from the decision table in such a way that consistent objects are re-classified to their original classes and inconsistent objects are classified to clusters of classes referring to this inconsistency. We call ''minimal'' each set of decision rules that is complete and non-redundant, i.e. exclusion of any rule from this set makes it non-complete.
One of three induction strategies can be adopted to obtain a set of decision rules:
* generation of a minimal description, i.e. a minimal set of rules,
* generation of an exhaustive description, i.e. all rules for a given data matrix,
* generation of a characteristic description, i.e. a set of rules covering relatively many objects each, however, all together not necessarily all objects from the decision table
The most popular rule induction algorithm for dominance-based rough set approach is DOMLEM, which generates minimal set of rules.
Example
Consider the following problem of high school students’ evaluations:
:
Each object (student) is described by three criteria
, related to the levels in Mathematics, Physics and Literature, respectively. According to the decision attribute, the students are divided into three preference-ordered classes:
,
and
. Thus, the following unions of classes were approximated:
*
i.e. the class of (at most) bad students,
*
i.e. the class of at most medium students,
*
i.e. the class of at least medium students,
*
i.e. the class of (at least) good students.
Notice that evaluations of objects
and
are inconsistent, because
has better evaluations on all three criteria than
but worse global score.
Therefore, lower approximations of class unions consist of the following objects:
:
:
:
:
Thus, only classes
and
cannot be approximated precisely. Their upper approximations are as follows:
:
:
while their boundary regions are:
:
Of course, since
and
are approximated precisely, we have
,
and
The following minimal set of 10 rules can be induced from the decision table:
# ''if''
''then''
# ''if''
''and''
''and''
''then''
# ''if''
''then''
# ''if''
''and''
''then''
# ''if''
''and''
''then''
# ''if''
''and''
''then''
# ''if''
''and''
''then''
# ''if''
''then''
# ''if''
''then''
# ''if''
''and''
''then''
The last rule is approximate, while the rest are certain.
Extensions
Multicriteria choice and ranking problems
The other two problems considered within
multi-criteria decision analysis
Multiple-criteria decision-making (MCDM) or multiple-criteria decision analysis (MCDA) is a sub-discipline of operations research that explicitly evaluates multiple conflicting criteria in decision making (both in daily life and in settings ...
,
multicriteria choice and
ranking
A ranking is a relationship between a set of items such that, for any two items, the first is either "ranked higher than", "ranked lower than" or "ranked equal to" the second.
In mathematics, this is known as a weak order or total preorder of o ...
problems, can also be solved using dominance-based rough set approach. This is done by converting the decision table into
pairwise comparison table
Pairwise generally means "occurring in pairs" or "two at a time."
Pairwise may also refer to:
* Pairwise disjoint
* Pairwise independence of random variables
* Pairwise comparison, the process of comparing two entities to determine which is prefe ...
(PCT).
Variable-consistency DRSA
The definitions of rough approximations are based on a strict application of the dominance principle. However, when defining non-ambiguous objects, it is reasonable to accept a limited proportion of negative examples, particularly for large decision tables. Such extended version of DRSA is called
Variable-Consistency DRSA model (VC-DRSA)
Stochastic DRSA
In real-life data, particularly for large datasets, the notions of rough approximations were found to be excessively restrictive. Therefore, an extension of DRSA, based on stochastic model (
Stochastic DRSA), which allows inconsistencies to some degree, has been introduced.
[Dembczyński, K., Greco, S., Kotłowski, W., Słowiński, R.: Statistical model for rough set approach to multicriteria classification. In Kok, J.N., Koronacki, J., de Mantaras, R.L., Matwin, S.,
Mladenic, D., Skowron, A. (eds.): Knowledge Discovery in Databases: PKDD 2007, Warsaw, Poland. Lecture Notes in Computer Science 4702 (2007) 164–175.] Having stated the probabilistic model for ordinal classification problems with monotonicity constraints, the concepts of lower approximations are extended to the
stochastic case. The method is based on estimating the conditional probabilities using the nonparametric
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
method which leads
to the problem of
isotonic regression
In statistics and numerical analysis, isotonic regression or monotonic regression is the technique of fitting a free-form line to a sequence of observations such that the fitted line is non-decreasing (or non-increasing) everywhere, and lies as ...
.
Stochastic dominance-based rough sets can also be regarded as a sort of variable-consistency model.
Software
4eMka2is a
decision support system
A decision support system (DSS) is an information system that supports business or organizational decision-making activities. DSSs serve the management, operations and planning levels of an organization (usually mid and higher management) and h ...
for multiple criteria classification problems based on dominance-based rough sets (DRSA)
JAMMis a much more advanced successor of 4eMka2. Both systems are freely available for non-profit purposes on th
Laboratory of Intelligent Decision Support Systems (IDSS)website.
See also
*
Rough sets
In computer science, a rough set, first described by Polish computer scientist Zdzisław I. Pawlak, is a formal approximation of a crisp set (i.e., conventional set) in terms of a pair of sets which give the ''lower'' and the ''upper'' approximati ...
*
Granular computing Granular computing (GrC) is an emerging computing paradigm of information processing that concerns the processing of complex information entities called "information granules", which arise in the process of data abstraction and derivation of knowl ...
*
Multicriteria Decision Analysis (MCDA)
References
{{reflist
* Chakhar S., Ishizaka A., Labib A., Saad I. (2016). Dominance-based rough set approach for group decisions, European Journal of Operational Research, 251(1): 206-224
* Li S., Li T. Zhang Z., Chen H., Zhang J. (2015). Parallel Computing of Approximations in Dominance-based Rough Sets Approach, Knowledge-based Systems, 87: 102-111
* Li S., Li T. (2015). Incremental Update of Approximations in Dominance-based Rough Sets Approach under the Variation of Attribute Values, Information Sciences, 294: 348-361
* Li S., Li T., Liu D. (2013). Dynamic Maintenance of Approximations in Dominance-based Rough Set Approach under the Variation of the Object Set, International Journal of Intelligent Systems, 28(8): 729-751
External links
The International Rough Set SocietyLaboratory of Intelligent Decision Support Systems (IDSS)a
Poznań University of Technology
* Extensive list of DRSA references on th
home page.
Theoretical computer science
Machine learning algorithms
Multiple-criteria decision analysis