Robust estimation and its application to a classification problem

In the article a classi cation problem with two normally distributed classes is considered. The problem is solved using empirical discriminant functions for a Gaussian classi er and estimators for unknown parameters of the multivariate normal distribution. The following estimators will be considered: the maximum likelihood estimator (MLE), the Kulawik-Zontek estimator (KZE) and the minimum covariance determinant estimator (MCDE). Classi ers based on MLE and KZE will be compared in case of an empirical example (small sample). For large sample classi ers based on MLE, KZE and MCDE will be used. 2010 Mathematics Subject Classi cation: Primary: 62C12, Secondary: 62P30.


Introduction
The main aim of classication ( [18], [5]) is to make a decision which class should be attributed to a new observation. For this purpose it is possible to use a classier based on a training set of elements whose categories (class labels) are known. In the paper a classier based on an estimator for parameters of the multivariate normal model is considered. Hence, the classes will be assumed to be multivariate normally distributed and then a Gaussian classier will be used. In the article two multivariate normally distributed classes are considered. Parameters -shift and positive denite covariance matrix, are unknown. Estimators of the parameters appear in the form of the empirical discriminant functions for Gaussian classiers. In the article three estimators will be used: the maximum likelihood estimator ( [15], [17]) (MLE), the Kulawik-Zontek estimator ( [12]) (KZE) and the minimum covariance determinant estimator ( [16]) (MCDE).
It is well known that in the case of model data -normally distributed, MLE is the best choice for estimating the unknown shift parameter and covariance matrix of the multivariate normal distribution ( [18]). The situation is changing for data that is not normally distributed and it can happen for samples with small size. For the data it is better to use robust estimators ( [14]). In the paper two such estimators are considered: KZE and MCDE. The problem of the robust classication is known in literature (for example [19]). The authors compare various robust estimators. However, KZE has not been considered anywhere. The main aim of the article is to nd advantages of using KZE compared to the other two estimators. In the case of a small sample it will turn out that for KZE the percentage of wrongly classied elements is lower than the one for MLE. For a large sample the percentage will be the smallest for MLE. On one hand, the percentage for KZE will be only little greater than the one for MLE but on the other hand the percentage for KZE will be clearly lower than the one for MCDE.
In the next chapter some basic information concerning Gaussian classiers is given. In the third chapter three estimators of parameters for the multivariate normal distribution are described. In the last chapter two empirical examples are presented. The rst example concerns motors (a small sample) and two estimators: the maximum likelihood estimator and the Kulawik-Zontek estimator are considered. The second example concerns the chemical analysis of wine (a large sample) and three estimators are involved: the maximum likelihood estimator, the Kulawik-Zontek estimator and the minimum covariance determinant estimator. In calculation software environment "R" was used. More precisely, • "R" package "expm" ( [6]) was used in computing KZE; • "R" package "DetMCD" ( [10]) was used in computing MCDE; • "R" package "conics" ( [3]) was used to plot the separating surfaces; • "R" package "mvnormtest" ( [9]) was used to test normality of multivariate data.
2. Gaussian classier We consider the classication with two classes.
Assume that each class is multivariate normally N J (µ i , Σ i ) distributed with the density function of the form z ∈ R J , where µ i , Σ i are unknown parameters, i = 1, 2. Discriminant functions for the Gaussian classier in the case of considered problem can be dened as for z ∈ R J , where P (i) denotes "a priori probability" for the i-th class, i = 1, 2. For the Gaussian classier the functions can be written in the following form g gauss After changing the unknown parameters µ i , Σ i , P (i) to their estimatorsμ i ,Σ i ,P (i) we get the empirical discriminant functions for the Gaussian classier: i = 1, 2, and the formula for separating surfacê for z ∈ R J which gives the following decision possibility: the label i = 1 is assigned to an elementẑ which is to classify, if and the label i = 2 is assigned to the elementẑ, if according to a maximum a posteriori probability (MAP) rule. Assume that J = 2 and z = (x, y) T ∈ R 2 . Then, for some m 1 , m 2 , σ 11 , σ 12 , σ 22 ∈ R. Let c 1 = − 1 2 ln|Σ 1 | + lnP (1). We have c 1 ∈ R and g gauss We can get an analogous form forĝ gauss 2 (z). It is clear then that the formula (2) can be written as where a, b, c, d, e, f are real coecients.
We can use various estimators for estimating parameters µ and Σ of multivariate normal distribution N J (µ, Σ). Consequently, various classiers will be obtained.

Estimators for parameters of multivariate normal model
The maximum likelihood estimator (MLE) is the best choice for model data. For contaminated data it is better to use robust estimators. We will focus on two robust estimators: the Kulawik-Zontek estimator (KZE) and the minimum covariance determinant estimator (MCDE). In article [12] described results of the computer simulation are related to estimators MLE and KZE. In particular it has been shown that KZE gives better estimates than MLE for contaminated data. In our article the results will be analogous in the case of corresponding classiers for empirical data. MCDE is one of the most popular robust estimators used by researchers. That is why we will give only a short explanation concerning MCDE (see [16]).

The maximum likelihood estimator (MLE)
Let z 1 ∈ R J , . . . , z n ∈ R J be a value of a random sample of size n drawn from the distribution N J (µ, Σ), where µ, Σ are unknown parameters ( µ ∈ R J is an expected value and Σ is a positive denite covariance matrix). For z 1 , . . . , z n the maximum likelihood estimator (MLE) of µ and Σ is given by the formula respectively. When z 1 ∈ R J , . . . , z n ∈ R J are not drawn from normal distribution, maximum likelihood estimation can give wrong results. In the case of observations it is better to use robust estimators ( [14]). We will focus on two robust estimators: the Kulawik-Zontek estimator and the minimum covariance determinant estimator.

The Kulawik-Zontek estimator
The Kulawik-Zontek estimator has been described in [12]. To estimate parameters µ and Σ of the multivariate normal model the covariance matrix should be written in the form , are the elements of a given basis of the vector space of real, square and symmetric matrices. The aim is to estimate the parameter θ = (µ T , α 1 , . . . , α k ) T ∈ Θ ⊂ R J+k . For the sample z 1 , . . . , z n ∈ R J the estimator is given by the formulâ where c is a properly chosen constant and ϕ : [0, +∞) → R is a function with the following properties: (B1) The function ϕ has a positive derivative on (0, +∞). (B2) The function xϕ (x 2 ) has the nonnegative derivative on [0, +∞) and there exists The constant c can be derived from the equation For a sample the formula (3) gives estimatesμ,α 1 , . . . ,α k . The covariance matrix Σ is estimated byΣ when it is a positive denite matrix. If not, it is possible to takeΣ equal to square root ("R" package "expm" -function sqrtm) of the matrix where the function φ is dened by its derivative x , |x| ≥ 2t.
The above functions are modications of Huber functions ( [7]). The family from example 3.1 consists of functions that depend on the tunning constant t. The tunning constant is usually given. However, for some models it is possible to get the data dependent tuning constant (for example [13]). In the case of our article modications of the functions will be used with a proper given constant. This type of modications has already been used by T. Bednarski and S. Zontek in [2].
The minimum covariance determinant estimator (MCDE) The minimum covariance determinant estimator ( [16]) is a robust estimator of the expectation and covariance matrix for the multivariate normal distribution. The estimator is based on the subset of all given observations for which the covariance matrix has the smallest determinant. The mean of the elements from the chosen subset is the minimum covariance determinant estimator (MCDE) of the population mean and their covariance matrix is MCDE of the population covariance matrix. For empirical problems with multivariate data it is better to use an approximation of MCDE's values rather than the exact ones because of computation time. To get an approximation the Deterministic Minimum Covariance Determinant algorithm ( [8]) can be used.

Empirical examples Motor problem
In article [1] the authors have presented an attempt to draw the image of the cognition method application in the diagnostic experiment carried out in tests of the single-phase induction motors. More precisely, motors of type SZXb6514 B made by Zakªad Silników Elektrycznych Maªej Mocy "Silma" in Sosnowiec (Low Power Electric Motors Company "Silma") were considered (11 usable motors and 23 that are not usable). The motors were represented by 9-dimensional vectors and the authors used the Karhunen-Loeve (K-L) method ( [5]) and got 2-dimensional vectors. Not usable motors were grouped with respect to the following types of defects: • B -rubbing, • C -loudness -loud operation, • E -high current, • F -increased vibration level, • G -no rivet in the sheet package.
The following classication problems with two classes were considered: • usable motors (A) -not usable motors (BCEFG), Gaussian classiers using the empirical discriminant functions for the estimators Fractions of wrongly classied elements (Leave-One-Out Method) are given in Table 1. For the case E -BCFG the fraction is smaller than the one for MLE. In other cases the fractions are the same. An analysis concerning normality of the proper datasets was done. Shapiro-Wilk normality test for multivariate data allowed us to reject the hypothesis of normality for the sets: B, C, G, BCEFG, BEFG, BCFG (red in Table 1).  The percentages of errors (wrongly classied elements) are given in Table  2. The percentage for KZE (4, 35%) is lower than the one for MLE (5, 22%). The results show that for the small size sample problem it was better to use KZE than MLE. The database presents the eect of three types (type 1, type 2, type 3) of cultivars on the chemical analysis of wines from the same region in Italy. The following chemical features are investigated: alcohol, malic acid, ash, alcalinity of ash, magnesium, total phenols, avanoids, nonavanoid phenols, proanthocyanins, color intensity, hue, OD280/OD315 of diluted wines, proline. The analysis based on 144 13-dimensional vectors taken from the dataset WINE (48 vectors for each type of cultivars). Using the linear discriminant analysis ([11]) 2-dimensional vectors have been obtained ("R" package "MASS"function lda). We are interested in the classication with two classes, hence the data was divided into three problems: • type 1 -types 2,3, • type 2 -types 1,3, • type 3 -types 1,2.
In the problems Gaussian classiers using the empirical discriminant functions for estimators: • KZE (with t = 1, 445 and c = 0, 865), • MCDE, will be compared. Figures 4 and 5 present the image of the 13-dimensional vectors (primary features) transformed to 2-dimensional vectors (secondary features) using the linear discriminant analysis. The gures present also separating surfaces for the considered cases. Orange is used for MLE, black is for KZE and purple is for MCDE.
Fractions of the wrongly classied elements (10-fold cross-validation) are given in Table 3. The percentage of errors (the wrongly classied elements) are given in Table 4. In the case 1-(2,3) (Table 3)  In the case 3-(1,2) for MCDE we notice the highest number of the wrongly classied elements. Meanwhile for KZE the number is zero. The percentages of the wrongly classied elements show that the best choice for the problem is MLE but also KZE. MCDE gives the worst results. An analysis concerning normality of the proper datasets was done. Shapiro-Wilk normality test for multivariate data did not allow us to reject the hypothesis of normality in any case.

Conclusions
KZE is a modication of MLE. Using an empirical example (the motor problem) we compared the corresponding two classiers. The percentage of  the wrongly classied elements is lower in the case of the classier using the KZE estimator. KZE is a robust estimator, so another robust estimator was chosen for further analysis -one of the most popular (MCDE). We compared the three classiers for large samples (normal samples). It occurs that the classier using KZE is not much worse than the classier using MLE. Meanwhile, classier for MCDE seems to give much worse results.
To sum up, we can see that in the case of the real data (the motors problem) for which the hypothesis of normality is rejected the classiers based on robust estimators are a better choice than the classier based on MLE. However, for a sample for which the hypothesis concerning normality is not rejected (wine problem) the classier based on robust KZE seems to be better than the classier based on robust MCDE.
Conducting research on real data has shown that KZE can be used in industry. In the case of engines KZE allows to characterize and distinguish types of damage more precisely in comparison with MLE. This can be seen for example in Figure 2