Skip navigation

Zastosuj identyfikator do podlinkowania lub zacytowania tej pozycji: http://hdl.handle.net/20.500.12128/21818
Tytuł: Stop Criterion in Building Decision Trees with Bagging Method for Dispersed Data
Autor: Przybyła-Kasperek, Małgorzata
Aning, Samuel
Słowa kluczowe: Ensemble of classifiers; Dispersed data; Stop criterion; Bagging method; Classification treess; Independent data sources
Data wydania: 2021
Źródło: "Procedia Computer Science" Vol. 192 (2021), s. 3560-3569
Abstrakt: This article discusses issues related to decision making based on applying decision trees and bagging methods on dispersed knowledge. In dispersed knowledge, local decision tables possess data independently in fragments. In this study, sub-tables are further generated with bagging method for each local table, based on which the decision trees are built. These decision trees classify the test object, and a probability vector is defined over the decision classes for each local table. For each vector, decision classes with the maximum value of the coordinates are selected and final joint decisions for all local tables are made by majority voting. Quality of decision making has been observed to increase when bagging method as an ensemble method is combined with decision trees on independent dispersed data. An important criterion in building a decision tree is to know when to stop growing the tree (stop splitting). That is, at what minimum number of objects on a working node do we stop building the tree to ensure the best decision results. The contribution of the paper is to observe the influence a stop criterion (expressed in the number of objects in the node) for decision trees used in conjunction with bagging method on independent data sources. It can be concluded that in dispersed data set, the stop split criteria does not influence the classification quality much. The statistical significance of the difference in the mean classification error values was confirmed only for a very high stop criterion (0.1× number of objects in training set) and for a very low stop criterion (equal to two). There is no significant statistical difference in the classification quality obtained for the stop criterion values: 4, 6, 8 and 10. An interesting remark is that for some dispersed data sets, in the case of smaller number of local tables and larger number of bootstrap samples, better quality of classification is obtained for a small number of objects in the stop criterion (mostly for two objects). Only, at a significant increase in the minimum number of objects at which growth of trees is stopped is quality of classification affected. However, the gain in reducing the complexity for trees that we get when using the larger values of stop criterion is significant.
URI: http://hdl.handle.net/20.500.12128/21818
DOI: 10.1016/j.procs.2021.09.129
ISSN: 1877-0509
Pojawia się w kolekcji:Artykuły (WNŚiT)

Pliki tej pozycji:
Plik Opis RozmiarFormat 
Przybyla-Kasperek_Stop_Criterion_in_Building_Decision_Trees_with_Bagging_Method.pdf834,18 kBAdobe PDFPrzejrzyj / Otwórz
Pokaż pełny rekord


Uznanie autorstwa - użycie niekomercyjne, bez utworów zależnych 3.0 Polska Creative Commons Creative Commons