computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, a rough set, first described by

Polish Polish may refer to: * Anything from or related to Poland, a country in Europe * Polish language * Polish people, people from Poland or of Polish descent * Polish chicken * Polish brothers (Mark Polish and Michael Polish, born 1970), American twin ...

computer scientist Zdzisław I. Pawlak, is a formal approximation of a

crisp set Crisp may refer to: Foods * Potato chips, known in British English as a potato crisp, a thin slice of a potato deep fried or baked until crispy * Chip (snack), or crisp in British English * Crisp (dessert), a type of American dessert, usually inc ...

(i.e., conventional set) in terms of a pair of sets which give the ''lower'' and the ''upper'' approximation of the original set. In the standard version of rough set theory described in Pawlak (1991), the lower- and upper-approximation sets are crisp sets, but in other variations, the approximating sets may be

fuzzy set Fuzzy or Fuzzies may refer to: Music * Fuzzy (band), a 1990s Boston indie pop band * Fuzzy (composer), Danish composer Jens Vilhelm Pedersen (born 1939) * Fuzzy (album), ''Fuzzy'' (album), 1993 debut album of American rock band Grant Lee Buffalo ...

Definitions

The following section contains an overview of the basic framework of rough set theory, as originally proposed by Zdzisław I. Pawlak, along with some of the key definitions. More formal properties and boundaries of rough sets can be found in and cited references. The initial and basic theory of rough sets is sometimes referred to as ''"Pawlak Rough Sets"'' or ''"classical rough sets"'', as a means to distinguish it from more recent extensions and generalizations.

Information system framework

Let

I = (\mathbb,\mathbb)

be an information system (

attribute–value system An attribute–value system is a basic knowledge representation framework comprising a table with columns designating "attributes" (also known as "properties", "predicates", "features", " dimensions", "characteristics", "fields", "headers" or "inde ...

), where

\mathbb

is a non-empty, finite set of objects (the universe) and

\mathbb

is a non-empty, finite set of attributes such that

I:\mathbb \rightarrow V_a

for every

a \in \mathbb

V_a

is the set of values that attribute

a

may take. The information table assigns a value

a(x)

from

V_a

to each attribute

a

and object

x

in the universe

\mathbb

. With any

P \subseteq \mathbb

there is an associated

equivalence relation In mathematics, an equivalence relation is a binary relation that is reflexive, symmetric, and transitive. The equipollence relation between line segments in geometry is a common example of an equivalence relation. A simpler example is equ ...

\mathrm(P)

: :

\mathrm(P) = \left\

The relation

\mathrm(P)

is called a

P

''-indiscernibility relation''. The partition of

\mathbb

is a family of all

equivalence class In mathematics, when the elements of some set S have a notion of equivalence (formalized as an equivalence relation), then one may naturally split the set S into equivalence classes. These equivalence classes are constructed so that elements ...

es of

\mathrm(P)

and is denoted by

\mathbb/\mathrm(P)

(or

\mathbb/P

). If

(x,y)\in \mathrm(P)

, then

x

and

y

are ''indiscernible'' (or indistinguishable) by attributes from

P

. The equivalence classes of the

P

-indiscernibility relation are denoted

P

Example: equivalence-class structure

For example, consider the following information table: : When the full set of attributes

P = \

is considered, we see that we have the following seven equivalence classes: :

\begin 
\ \\ 
\ \\ 
\ \\ 
\ \\
\ \\
\ \\
\ \end

Thus, the two objects within the first equivalence class,

\

, cannot be distinguished from each other based on the available attributes, and the three objects within the second equivalence class,

\

, cannot be distinguished from one another based on the available attributes. The remaining five objects are each discernible from all other objects. It is apparent that different attribute subset selections will in general lead to different indiscernibility classes. For example, if attribute

P =\

alone is selected, we obtain the following, much coarser, equivalence-class structure: :

\begin
\ \\ 
\ \\ 
\ \end

Definition of a ''rough set''

Let

X \subseteq \mathbb

be a target set that we wish to represent using attribute subset

P

; that is, we are told that an arbitrary set of objects

X

comprises a single class, and we wish to express this class (i.e., this subset) using the equivalence classes induced by attribute subset

P

. In general,

X

cannot be expressed exactly, because the set may include and exclude objects which are indistinguishable on the basis of attributes

P

. For example, consider the target set

X = \

, and let attribute subset

P = \

, the full available set of features. The set

X

cannot be expressed exactly, because in

P,

, objects

\

are indiscernible. Thus, there is no way to represent any set

X

which ''includes''

O_

but ''excludes'' objects

O_

and

O_

. However, the target set

X

can be ''approximated'' using only the information contained within

P

by constructing the

P

-lower and

P

-upper approximations of

X

: :

X= \

X = \

Lower approximation and positive region

The

P

''-lower approximation'', or ''positive region'', is the union of all equivalence classes in

P

which are contained by (i.e., are subsets of) the target set – in the example,

X = \ \cup \

, the union of the two equivalence classes in

P

which are contained in the target set. The lower approximation is the complete set of objects in

\mathbb/P

that can be ''positively'' (i.e., unambiguously) classified as belonging to target set

X

Upper approximation and negative region

The

P

''-upper approximation'' is the union of all equivalence classes in

P

which have non-empty intersection with the target set – in the example,

X = \ \cup \ \cup \

, the union of the three equivalence classes in

P

that have non-empty intersection with the target set. The upper approximation is the complete set of objects that in

\mathbb/P

that ''cannot'' be positively (i.e., unambiguously) classified as belonging to the ''complement'' (

\overline X

) of the target set

X

. In other words, the upper approximation is the complete set of objects that are ''possibly'' members of the target set

X

. The set

\mathbb-X

therefore represents the ''negative region'', containing the set of objects that can be definitely ruled out as members of the target set.

Boundary region

The ''boundary region'', given by set difference

X - X

, consists of those objects that can neither be ruled in nor ruled out as members of the target set

X

. In summary, the lower approximation of a target set is a ''conservative'' approximation consisting of only those objects which can positively be identified as members of the set. (These objects have no indiscernible "clones" which are excluded by the target set.) The upper approximation is a ''liberal'' approximation which includes all objects that might be members of target set. (Some objects in the upper approximation may not be members of the target set.) From the perspective of

\mathbb/P

, the lower approximation contains objects that are members of the target set with certainty (probability = 1), while the upper approximation contains objects that are members of the target set with non-zero probability (probability > 0).

The rough set

The tuple

\langleX,X\rangle

composed of the lower and upper approximation is called a ''rough set''; thus, a rough set is composed of two crisp sets, one representing a ''lower boundary'' of the target set

X

, and the other representing an ''upper boundary'' of the target set

X

. The ''accuracy'' of the rough-set representation of the set

X

can be given by the following: :

\alpha_(X) = \frac

That is, the accuracy of the rough set representation of

X

\alpha_(X)

0 \leq \alpha_(X) \leq 1

, is the ratio of the number of objects which can ''positively'' be placed in

X

to the number of objects that can ''possibly'' be placed in

X

– this provides a measure of how closely the rough set is approximating the target set. Clearly, when the upper and lower approximations are equal (i.e., boundary region empty), then

\alpha_(X) = 1

, and the approximation is perfect; at the other extreme, whenever the lower approximation is empty, the accuracy is zero (regardless of the size of the upper approximation).

Objective analysis

Rough set theory is one of many methods that can be employed to analyse uncertain (including vague) systems, although less common than more traditional methods of

probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

entropy Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...

and

Dempster–Shafer theory The theory of belief functions, also referred to as evidence theory or Dempster–Shafer theory (DST), is a general framework for reasoning with uncertainty, with understood connections to other frameworks such as probability, possibility and ...

. However a key difference, and a unique strength, of using classical rough set theory is that it provides an objective form of analysis. Unlike other methods, as those given above, classical rough set analysis requires no additional information, external parameters, models, functions, grades or subjective interpretations to determine set membership – instead it only uses the information presented within the given data. More recent adaptations of rough set theory, such as dominance-based, decision-theoretic and fuzzy rough sets, have introduced more subjectivity to the analysis.

Definability

In general, the upper and lower approximations are not equal; in such cases, we say that target set

X

is ''undefinable'' or ''roughly definable'' on attribute set

P

. When the upper and lower approximations are equal (i.e., the boundary is empty),

X = X

, then the target set

X

is ''definable'' on attribute set

P

. We can distinguish the following special cases of undefinability: * Set

X

is ''internally'' ''undefinable'' if

X = \emptyset

and

X \neq \mathbb

. This means that on attribute set

P

, there are ''no'' objects which we can be certain belong to target set

X

, but there ''are'' objects which we can definitively exclude from set

X

. * Set

X

is ''externally undefinable'' if

X \neq \emptyset

and

X = \mathbb

. This means that on attribute set

P

, there ''are'' objects which we can be certain belong to target set

X

, but there are ''no'' objects which we can definitively exclude from set

X

. * Set

X

is ''totally undefinable'' if

X = \emptyset

and

X = \mathbb

. This means that on attribute set

P

, there are ''no'' objects which we can be certain belong to target set

X

, and there are ''no'' objects which we can definitively exclude from set

X

. Thus, on attribute set

P

, we cannot decide whether any object is, or is not, a member of

X

Reduct and core

An interesting question is whether there are attributes in the information system (attribute–value table) which are more important to the knowledge represented in the equivalence class structure than other attributes. Often, we wonder whether there is a subset of attributes which can, by itself, fully characterize the knowledge in the database; such an attribute set is called a ''reduct''. Formally, a reduct is a subset of attributes

\mathrm \subseteq P

such that *

P

, that is, the equivalence classes induced by the reduced attribute set

\mathrm

are the same as the equivalence class structure induced by the full attribute set

P

. * the attribute set

\mathrm

is ''minimal'', in the sense that

\neq P

for any attribute

a \in \mathrm

; in other words, no attribute can be removed from set

\mathrm

without changing the equivalence classes

P

. A reduct can be thought of as a ''sufficient'' set of features – sufficient, that is, to represent the category structure. In the example table above, attribute set

\

is a reduct – the information system projected on just these attributes possesses the same equivalence class structure as that expressed by the full attribute set: :

\begin 
\ \\ 
\ \\ 
\ \\ 
\ \\
\ \\
\ \\
\ \end

Attribute set

\

is a reduct because eliminating any of these attributes causes a collapse of the equivalence-class structure, with the result that

\neq P

. The reduct of an information system is ''not unique'': there may be many subsets of attributes which preserve the equivalence-class structure (i.e., the knowledge) expressed in the information system. In the example information system above, another reduct is

\

, producing the same equivalence-class structure as

P

. The set of attributes which is common to all reducts is called the ''core'': the core is the set of attributes which is possessed by ''every'' reduct, and therefore consists of attributes which cannot be removed from the information system without causing collapse of the equivalence-class structure. The core may be thought of as the set of ''necessary'' attributes – necessary, that is, for the category structure to be represented. In the example, the only such attribute is

\

; any one of the other attributes can be removed singly without damaging the equivalence-class structure, and hence these are all ''dispensable''. However, removing

\

by itself ''does'' change the equivalence-class structure, and thus

\

is the ''indispensable'' attribute of this information system, and hence the core. It is possible for the core to be empty, which means that there is no indispensable attribute: any single attribute in such an information system can be deleted without altering the equivalence-class structure. In such cases, there is no ''essential'' or necessary attribute which is required for the class structure to be represented.

Attribute dependency

One of the most important aspects of database analysis or data acquisition is the discovery of attribute dependencies; that is, we wish to discover which variables are strongly related to which other variables. Generally, it is these strong relationships that will warrant further investigation, and that will ultimately be of use in predictive modeling. In rough set theory, the notion of dependency is defined very simply. Let us take two (disjoint) sets of attributes, set

P

and set

Q

, and inquire what degree of dependency obtains between them. Each attribute set induces an (indiscernibility) equivalence class structure, the equivalence classes induced by

P

given by

P

, and the equivalence classes induced by

Q

given by

Q

. Let

Q = \

, where

Q_i

is a given equivalence class from the equivalence-class structure induced by attribute set

Q

. Then, the ''dependency'' of attribute set

Q

on attribute set

P

\gamma_(Q)

, is given by :

\gamma_(Q) =  \frac  \leq 1

That is, for each equivalence class

Q_i

Q

, we add up the size of its lower approximation by the attributes in

P

, i.e.,

Q_i

. This approximation (as above, for arbitrary set

X

) is the number of objects which on attribute set

P

can be positively identified as belonging to target set

Q_i

. Added across all equivalence classes in

Q

, the numerator above represents the total number of objects which – based on attribute set

P

– can be positively categorized according to the classification induced by attributes

Q

. The dependency ratio therefore expresses the proportion (within the entire universe) of such classifiable objects. The dependency

\gamma_(Q)

"can be interpreted as a proportion of such objects in the information system for which it suffices to know the values of attributes in

P

to determine the values of attributes in

Q

". Another, intuitive, way to consider dependency is to take the partition induced by

Q

as the target class

C

, and consider

P

as the attribute set we wish to use in order to "re-construct" the target class

C

. If

P

can completely reconstruct

C

, then

Q

depends totally upon

P

; if

P

results in a poor and perhaps a random reconstruction of

C

, then

Q

does not depend upon

P

at all. Thus, this measure of dependency expresses the degree of ''functional'' (i.e., deterministic) dependency of attribute set

Q

on attribute set

P

; it is ''not'' symmetric. The relationship of this notion of attribute dependency to more traditional information-theoretic (i.e., entropic) notions of attribute dependence has been discussed in a number of sources, e.g. Pawlak, Wong, & Ziarko (1988), Yao & Yao (2002), Wong, Ziarko, & Ye (1986), and Quafafou & Boussouf (2000).

Rule extraction

The category representations discussed above are all ''extensional'' in nature; that is, a category or complex class is simply the sum of all its members. To represent a category is, then, just to be able to list or identify all the objects belonging to that category. However, extensional category representations have very limited practical use, because they provide no insight for deciding whether novel (never-before-seen) objects are members of the category. What is generally desired is an ''intensional'' description of the category, a representation of the category based on a set of ''rules'' that describe the scope of the category. The choice of such rules is not unique, and therein lies the issue of

inductive bias The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. Inductive bias is anything which makes the algorithm learn o ...

. See

Version space Version space learning is a logical approach to machine learning, specifically binary classification. Version space learning algorithms search a predefined space of hypotheses, viewed as a set of logical sentences. Formally, the hypothesis space i ...

and

Model selection Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one. In the context of machine learning and more generally statistical analysis, this may be the selection of ...

for more about this issue. There are a few rule-extraction methods. We will start from a rule-extraction procedure based on Ziarko & Shan (1995).

Decision matrices

Let us say that we wish to find the minimal set of consistent rules (

logical implication Logic is the study of correct reasoning. It includes both formal and informal logic. Formal logic is the study of deductively valid inferences or logical truths. It examines how conclusions follow from premises based on the structure of ...

s) that characterize our sample system. For a set of ''condition'' attributes

\mathcal = \

and a decision attribute

Q, Q \notin \mathcal

, these rules should have the form

P_i^a P_j^b \dots P_k^c \to Q^d

, or, spelled out, :

(P_i=a) \land (P_j=b) \land \dots \land (P_k=c) \to (Q=d)

where

\

are legitimate values from the domains of their respective attributes. This is a form typical of

association rules Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.Pi ...

, and the number of items in

\mathbb

which match the condition/antecedent is called the ''support'' for the rule. The method for extracting such rules given in is to form a ''decision matrix'' corresponding to each individual value

d

of decision attribute

Q

. Informally, the decision matrix for value

d

of decision attribute

Q

lists all attribute–value pairs that ''differ'' between objects having

Q = d

and

Q \ne d

. This is best explained by example (which also avoids a lot of notation). Consider the table above, and let

P_

be the decision variable (i.e., the variable on the right side of the implications) and let

\

be the condition variables (on the left side of the implication). We note that the decision variable

P_

takes on two different values, namely

\

. We treat each case separately. First, we look at the case

P_=1

, and we divide up

\mathbb

into objects that have

P_=1

and those that have

P_ \ne 1

. (Note that objects with

P_ \ne 1

in this case are simply the objects that have

P_=2

, but in general,

P_ \ne 1

would include all objects having any value for

P_

''other than''

P_=1

, and there may be several such classes of objects (for example, those having

P_=2,3,4,etc.

).) In this case, the objects having

P_=1

are

\

while the objects which have

P_ \ne 1

are

\

. The decision matrix for

P_=1

lists all the differences between the objects having

P_=1

and those having

P_ \ne 1

; that is, the decision matrix lists all the differences between

\

and

\

. We put the "positive" objects (

P_=1

) as the rows, and the "negative" objects

P_ \ne 1

as the columns. : To read this decision matrix, look, for example, at the intersection of row

O_

and column

O_

, showing

P_1^2,P_3^0

in the cell. This means that ''with regard to'' decision value

P_=1

, object

O_

differs from object

O_

on attributes

P_1

and

P_3

, and the particular values on these attributes for the positive object

O_

are

P_1=2

and

P_3=0

. This tells us that the correct classification of

O_

as belonging to decision class

P_=1

rests on attributes

P_1

and

P_3

; although one or the other might be dispensable, we know that ''at least one'' of these attributes is ''in''dispensable. Next, from each decision matrix we form a set of Boolean expressions, one expression for each row of the matrix. The items within each cell are aggregated disjunctively, and the individuals cells are then aggregated conjunctively. Thus, for the above table we have the following five Boolean expressions: :

\begin
(P_1^1 \lor P_2^2 \lor P_3^0) \land (P_1^1 \lor P_2^2) \land (P_1^1 \lor P_2^2 \lor P_3^0) \land (P_1^1 \lor P_2^2 \lor P_3^0) \land (P_1^1 \lor P_2^2) \\ 
(P_1^1 \lor P_2^2 \lor P_3^0) \land (P_1^1 \lor P_2^2) \land (P_1^1 \lor P_2^2 \lor P_3^0) \land (P_1^1 \lor P_2^2 \lor P_3^0) \land (P_1^1 \lor P_2^2) \\
(P_1^2 \lor P_3^0) \land (P_2^0) \land (P_1^2 \lor P_3^0) \land (P_1^2 \lor P_2^0 \lor P_3^0) \land (P_2^0) \\
(P_1^2 \lor P_3^0) \land (P_2^0) \land (P_1^2 \lor P_3^0) \land (P_1^2 \lor P_2^0 \lor P_3^0) \land (P_2^0) \\
(P_1^2 \lor P_3^0) \land (P_2^0) \land (P_1^2 \lor P_3^0) \land (P_1^2 \lor P_2^0 \lor P_3^0) \land (P_2^0)
\end

Each statement here is essentially a highly specific (probably ''too'' specific) rule governing the membership in class

P_=1

of the corresponding object. For example, the last statement, corresponding to object

O_

, states that all the following must be satisfied: # Either

P_1

must have value 2, or

P_3

must have value 0, or both. #

P_2

must have value 0. # Either

P_1

must have value 2, or

P_3

must have value 0, or both. # Either

P_1

must have value 2, or

P_2

must have value 0, or

P_3

must have value 0, or any combination thereof. #

P_2

must have value 0. It is clear that there is a large amount of redundancy here, and the next step is to simplify using traditional

Boolean algebra In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variable (mathematics), variables are the truth values ''true'' and ''false'', usually denot ...

. The statement

(P_1^1 \lor P_2^2 \lor P_3^0) \land (P_1^1 \lor P_2^2) \land (P_1^1 \lor P_2^2 \lor P_3^0) \land (P_1^1 \lor P_2^2 \lor P_3^0) \land (P_1^1 \lor P_2^2)

corresponding to objects

\

simplifies to

P_1^1  \lor P_2^2

, which yields the implication :

(P_1=1)  \lor (P_2=2) \to (P_=1)

Likewise, the statement

(P_1^2 \lor P_3^0) \land (P_2^0) \land (P_1^2 \lor P_3^0) \land (P_1^2 \lor P_2^0 \lor P_3^0) \land (P_2^0)

corresponding to objects

\

simplifies to

P_1^2 P_2^0 \lor P_3^0 P_2^0

. This gives us the implication :

(P_1=2 \land P_2=0) \lor (P_3=0 \land P_2=0) \to (P_=1)

The above implications can also be written as the following rule set: :

\begin
(P_1=1) \to (P_=1) \\
(P_2=2) \to (P_=1) \\
(P_1=2) \land (P_2=0) \to (P_=1) \\
(P_3=0) \land (P_2=0) \to (P_=1) 
\end

It can be noted that each of the first two rules has a ''support'' of 1 (i.e., the antecedent matches two objects), while each of the last two rules has a support of 2. To finish writing the rule set for this knowledge system, the same procedure as above (starting with writing a new decision matrix) should be followed for the case of

P_=2

, thus yielding a new set of implications for that decision value (i.e., a set of implications with

P_=2

as the consequent). In general, the procedure will be repeated for each possible value of the decision variable.

LERS rule induction system

The data system LERS (Learning from Examples based on Rough Sets) may induce rules from inconsistent data, i.e., data with conflicting objects. Two objects are conflicting when they are characterized by the same values of all attributes, but they belong to different concepts (classes). LERS uses rough set theory to compute lower and upper approximations for concepts involved in conflicts with other concepts. Rules induced from the lower approximation of the concept ''certainly'' describe the concept, hence such rules are called ''certain''. On the other hand, rules induced from the upper approximation of the concept describe the concept ''possibly'', so these rules are called ''possible''. For rule induction LERS uses three algorithms: LEM1, LEM2, and IRIM. The LEM2 algorithm of LERS is frequently used for rule induction and is used not only in LERS but also in other systems, e.g., in RSES. LEM2 explores the search space of attribute–value pairs. Its input data set is a lower or upper approximation of a concept, so its input data set is always consistent. In general, LEM2 computes a local covering and then converts it into a rule set. We will quote a few definitions to describe the LEM2 algorithm. The LEM2 algorithm is based on an idea of an attribute–value pair block. Let

X

be a nonempty lower or upper approximation of a concept represented by a decision-value pair

(d, w)

. Set

X

''depends'' on a set

T

of attribute–value pairs

t = (a, v)

if and only if :

\subseteq X.

Set

T

is a ''minimal complex'' of

X

if and only if

X

depends on

T

and no proper subset

S

T

exists such that

X

depends on

S

. Let

\mathbb

be a nonempty collection of nonempty sets of attribute–value pairs. Then

\mathbb

is a ''local covering'' of

X

if and only if the following three conditions are satisfied: each member

T

\mathbb

is a minimal complex of

X

, :

\bigcup_= X,

\mathbb

is minimal, i.e.,

\mathbb

has the smallest possible number of members. For our sample information system, LEM2 will induce the following rules: :

\begin
(P_1, 1) \to (P_4, 1) \\
(P_5, 0) \to (P_4, 1) \\
(P_1, 0) \to (P_4, 2) \\
(P_2, 1) \to (P_4, 2)
\end

Other rule-learning methods can be found, e.g., in Pawlak (1991), Stefanowski (1998), Bazan et al. (2004), etc.

Incomplete data

Rough set theory is useful for rule induction from incomplete data sets. Using this approach we can distinguish between three types of missing attribute values: ''lost values'' (the values that were recorded but currently are unavailable), ''attribute-concept values'' (these missing attribute values may be replaced by any attribute value limited to the same concept), and ''"do not care" conditions'' (the original values were irrelevant). A ''concept'' (''class'') is a set of all objects classified (or diagnosed) the same way. Two special data sets with missing attribute values were extensively studied: in the first case, all missing attribute values were lost, in the second case, all missing attribute values were "do not care" conditions. In attribute-concept values interpretation of a missing attribute value, the missing attribute value may be replaced by any value of the attribute domain restricted to the concept to which the object with a missing attribute value belongs. For example, if for a patient the value of an attribute Temperature is missing, this patient is sick with flu, and all remaining patients sick with flu have values high or very-high for Temperature when using the interpretation of the missing attribute value as the attribute-concept value, we will replace the missing attribute value with high and very-high. Additionally, the ''characteristic relation'', (see, e.g., ) enables to process data sets with all three kind of missing attribute values at the same time: lost, "do not care" conditions, and attribute-concept values.

Applications

Rough set methods can be applied as a component of hybrid solutions in

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

and

data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...

. They have been found to be particularly useful for

rule induction Rule induction is an area of machine learning in which formal rules are extracted from a set of observations. The rules extracted may represent a full scientific model of the data, or merely represent local patterns in the data. Data mining in ...

and

feature selection In machine learning, feature selection is the process of selecting a subset of relevant Feature (machine learning), features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: * sim ...

(semantics-preserving

dimensionality reduction Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...

). Rough set-based data analysis methods have been successfully applied in

bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...

economics Economics () is a behavioral science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services. Economics focuses on the behaviour and interac ...

and finance, medicine, multimedia, web and

text mining Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from differe ...

, signal and image processing,

software engineering Software engineering is a branch of both computer science and engineering focused on designing, developing, testing, and maintaining Application software, software applications. It involves applying engineering design process, engineering principl ...

, robotics, and engineering (e.g. power systems and

control engineering Control engineering, also known as control systems engineering and, in some European countries, automation engineering, is an engineering discipline that deals with control systems, applying control theory to design equipment and systems with d ...

). Recently the three regions of rough sets are interpreted as regions of acceptance, rejection and deferment. This leads to three-way decision making approach with the model which can potentially lead to interesting future applications.

History

The idea of rough set was proposed by Pawlak (1981) as a new mathematical tool to deal with vague concepts. Comer, Grzymala-Busse, Iwinski, Nieminen, Novotny, Pawlak, Obtulowicz, and Pomykala have studied algebraic properties of rough sets. Different algebraic semantics have been developed by P. Pagliani, I. Duntsch, M. K. Chakraborty, M. Banerjee and A. Mani; these have been extended to more generalized rough sets by D. Cattaneo and A. Mani, in particular. Rough sets can be used to represent

ambiguity Ambiguity is the type of meaning (linguistics), meaning in which a phrase, statement, or resolution is not explicitly defined, making for several interpretations; others describe it as a concept or statement that has no real reference. A com ...

vagueness In linguistics and philosophy, a vague predicate is one which gives rise to borderline cases. For example, the English adjective "tall" is vague since it is not clearly true or false for someone of middling height. By contrast, the word " prime" ...

and general

uncertainty Uncertainty or incertitude refers to situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown, and is particularly relevant for decision ...

Extensions and generalizations

Since the development of rough sets, extensions and generalizations have continued to evolve. Initial developments focused on the relationship - both similarities and difference - with

fuzzy sets Fuzzy or Fuzzies may refer to: Music * Fuzzy (band), a 1990s Boston indie pop band * Fuzzy (composer), Danish composer Jens Vilhelm Pedersen (born 1939) * Fuzzy (album), ''Fuzzy'' (album), 1993 debut album of American rock band Grant Lee Buffalo ...

. While some literature contends these concepts are different, other literature considers that rough sets are a generalization of fuzzy sets - as represented through either fuzzy rough sets or rough fuzzy sets. Pawlak (1995) considered that fuzzy and rough sets should be treated as being complementary to each other, addressing different aspects of uncertainty and vagueness. Three notable extensions of classical rough sets are: *

Dominance-based rough set approach The dominance-based rough set approach (DRSA) is an extension of rough set theory for multi-criteria decision analysis (MCDA), introduced by Greco, Matarazzo and Słowiński. Greco, S., Matarazzo, B., Słowiński, R.: Rough sets theory for multi- ...

(DRSA) is an extension of rough set theory for

multi-criteria decision analysis Multiple-criteria decision-making (MCDM) or multiple-criteria decision analysis (MCDA) is a sub-discipline of operations research that explicitly evaluates multiple conflicting criteria in decision making (both in daily life and in settings ...

(MCDA), introduced by Greco, Matarazzo and Słowiński (2001). The main change in this extension of classical rough sets is the substitution of the indiscernibility relation by a ''dominance'' relation, which permits the formalism to deal with inconsistencies typical in consideration of criteria and preference-ordered decision classes. *

Decision-theoretic rough sets In the mathematical theory of decisions, decision-theoretic rough sets (DTRS) is a probabilistic extension of rough set classification. First created in 1990 by Dr. Yiyu Yao, the extension makes use of loss functions to derive \textstyle \alpha ...

(DTRS) is a probabilistic extension of rough set theory introduced by Yao, Wong, and Lingras (1990). It utilizes a Bayesian decision procedure for minimum risk decision making. Elements are included into the lower and upper approximations based on whether their conditional probability is above thresholds

\textstyle \alpha

and

\textstyle \beta

. These upper and lower thresholds determine region inclusion for elements. This model is unique and powerful since the thresholds themselves are calculated from a set of six loss functions representing classification risks. * Game-theoretic rough sets (GTRS) is a game theory-based extension of rough set that was introduced by Herbert and Yao (2011). It utilizes a game-theoretic environment to optimize certain criteria of rough sets based classification or decision making in order to obtain effective region sizes.

Rough membership

Rough sets can be also defined, as a generalisation, by employing a rough membership function instead of objective approximation. The rough membership function expresses a conditional probability that

x

belongs to

X

given

\textstyle \R

. This can be interpreted as a degree that

x

belongs to

X

in terms of information about

x

expressed by

\textstyle \R

. Rough membership primarily differs from the fuzzy membership in that the membership of union and intersection of sets cannot, in general, be computed from their constituent membership as is the case of fuzzy sets. In this, rough membership is a generalization of fuzzy membership. Furthermore, the rough membership function is grounded more in probability than the conventionally held concepts of the fuzzy membership function.

Other generalizations

Several generalizations of rough sets have been introduced, studied and applied to solving problems. Here are some of these generalizations: *Rough multisets *Fuzzy rough sets extend the rough set concept through the use of fuzzy equivalence classes *Alpha rough set theory (α-RST) - a generalization of rough set theory that allows approximation using of fuzzy concepts *Intuitionistic fuzzy rough sets *Generalized rough fuzzy sets *Rough intuitionistic fuzzy sets *Soft rough fuzzy sets and soft fuzzy rough sets *Composite rough sets

References

External links

The International Rough Set Society

Rough set tutorial

Rough Set Exploration System

Rough Sets in Data Warehousing
{{Authority control Systems of set theory Theoretical computer science Approximations

Definitions

Information system framework

Example: equivalence-class structure

Definition of a ''rough set''

Lower approximation and positive region

Upper approximation and negative region

Boundary region

The rough set

Objective analysis

Definability

Reduct and core

Attribute dependency

Rule extraction

Decision matrices

LERS rule induction system

Incomplete data

Applications

History

Extensions and generalizations

Rough membership

Other generalizations

See also

References

Further reading

External links