Instance selection algorithms
The literature provides several different algorithms for instance selection. They can be distinguished from each other according to several different criteria. Considering this, instance selection algorithms can be grouped in two main classes, according to what instances they select: algorithms that preserve the instances at the boundaries of classes and algorithms that preserve the internal instances of the classes. Within the category of algorithms that select instances at the boundaries it is possible to cite DROP3,D. R. Wilson and T. R. Martinez, Reduction techniques for instance-based learning algorithms, Machine learning, vol. 38, no. 3, pp. 257–286, 2000. ICFH. Brighton and C. Mellish, Advances in instance selection for instance-based learning algorithms, Data mining and knowledge discovery, vol. 6, no. 2, pp. 153–172, 2002. and LSBo.E. Leyva, A. González, and R. Pérez, Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective, Pattern Recognition, vol. 48, no. 4, pp. 1523–1537, 2015. On the other hand, within the category of algorithms that select internal instances, it is possible to mention ENND. L. Wilson, “Asymptotic properties of nearest neighbor rules using edited data,” Systems, Man and Cybernetics, IEEE Transactions on, no. 3, pp. 408–421, 1972. and LSSm. In general, algorithm such as ENN and LSSm are used for removing harmful (noisy) instances from the dataset. They do not reduce the data as the algorithms that select border instances, but they remove instances at the boundaries that have a negative impact on the data mining task. They can be used by other instance selection algorithms, as a filtering step. For example, the ENN algorithm is used by DROP3 as the first step, and the LSSm algorithm is used by LSBo. There is also another group of algorithms that adopt different selection criteria. For example, the algorithms LDIS,Carbonera, Joel Luis, and Mara Abel. A density-based approach for instance selection. IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), 2015. CDISCarbonera, Joel Luis, and Mara Abel. A novel density-based approach for instance selection. IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), 2016. and XLDIS select the densest instances in a given arbitrary neighborhood. The selected instances can include both, border and internal instances. The LDIS and CDIS algorithms are very simple and select subsets that are very representative of the original dataset. Besides that, since they search by the representative instances in each class separately, they are faster (in terms of time complexity and effective running time) than other algorithms, such as DROP3 and ICF. Besides that, there is a third category of algorithms that, instead of selecting actual instances of the dataset, select prototypes (that can be synthetic instances). In this category it is possible to include PSSA, PSDSP and PSSP. The three algorithms adopt the notion of spatial partition (a hyperrectangle) for identifying similar instances and extract prototypes for each set of similar instances. In general, these approaches can also be modified for selecting actual instances of the datasets. The algorithm ISDSP adopts a similar approach for selecting actual instances (instead of prototypes).References
{{reflist Machine learning Data mining