Pages:     | 1 |   ...   | 44 | 45 || 47 | 48 |   ...   | 82 |

[5] Nozhov I. The Parse // http://www.computerra.ru/offline/2002/446/18250/ Authors' Information Gennadiy Lbov - SBRAS, The head of laboratory; P.O.Box: 630090, Novosibirsk, 4 Acad. Koptyug avenue, Russia; e-mail: lbov@math.nsc.ru Nikolai Dolozov - NSTU, The senior lecturer, Cand.Tech.Sci.; P.O.Box: 6300092, Novosibirsk, 20 Marks avenue, Russia; e-mail: dnl@interface.nsk.su Pavel Maslov - NSTU, The post-graduate student of of FAMI; P.O.Box: 6300092, Novosibirsk, 20 Marks avenue, Russia; e-mail: altermann@ngs.ru RECOGNITION OF THE HETEROGENEOUS MULTIVARIATE VARIABLETatyana Stupina Abstract: An application of the multivariate prediction method to solving the problem pattern recognition with respect to the sample size is considered in this paper. The criterion of multivariate heterogeneous variable recognition is used in this approach. The relation of this criterion with probability of error is shown. For the fixed complexities of probability distribution and logical decision function class the examples of pattern recognition problem are presented.

Keywords: the prediction of multivariate heterogeneous variable, the pattern recognition, the complexity of distribution.

This work was financially supported by RFBR-04-01-Decision Making Introduction The reducing relation problem with respect to sample is one of important problem in data mining. The quality of sample decision function depends on the size of the sample, the complexity of the distributions, and the complexity of the class of functions used by the algorithm for constructing sample decision functions. When the distribution is known the quality of decision function, for example, is risk function for one prediction variable. For the pattern recognition problem, for example, it is well-known probability of error. When the distribution is unknown the problem estimating of this quality with respect to the complexity of the distributions, of the functions class and sample size is appeared. At present time there are approaches solving this problem [Vapnik V.N., Chervonenkis A.Ya, 1970]. In addition the complexity of the functions class is assigned differently [Lbov G.S., Starceva N.G, 1999]. But that approaches consider the case of one prediction variable and one variable type.

However there are many important applied when we what to predict or recognize several heterogeneous variables. In work [Lbov G.S., Stupina T.A., 2002] was presented this problem statement. It is necessary to construct the sample decision function on the small sample in the multivariate heterogeneous space, so the most proper class is a class of logical decision functions [Lbov G.S., Starceva N.G, 1999]. In this paper for the fixed probability distribution the relation of the criterion with probability of error is shown for pattern recognition problem.

Problem Statement In the probabilistic statement of the problem, the value (x,y) is a realization of a multidimensional random variable (X,Y) on a probability space <,, >, where = DX DY is -measurable set (by Lebeg), is the borel -algebra of subsets of, is the probability measure (we will define such as c, the strategy of nature) on, DX is heterogeneous domain of under review variable, dim Dx = n, Dy is heterogeneous domain of objective variable, dim Dy = m. The given variables can be of arbitrary types (quantitative, ordinal, nominal). For the pattern recognition problem the variable Y is nominal. Let us put is a given class of decision functions. Class is -measurable functions that puts some subset of the objective variable Ey DY to each value of the y under review variable x DX, i.e. = {f : Dx 2D }. For example the domain Ey can contain the several patterns. The quality F(c,f ) of a decision function f under a fixed strategy of nature c is determined as F(c,f ) = (x) / x) - (Ey (x)))dP(x), where Ey (x) = f (x) is a value of decision functions in x, (P(Ey Dx P(y Ey (x) / x) is a conditional probability of event {y Ey } under a fixed x, (Ey (x)) is measurable of subset Ey. Note that if (Ey (x)) is probability measure, than criterion F(c,f ) is distance between distributions.

If the specified probability coincides with equal distribution than such prediction does not give no information on m (Ey j ) (Ey ) predicted variable (entropy is maximum). The measure (Ey (x)) = = is the normalized measure (DY ) (Dy j ) j =of subset Ey and it is introduced with taking into account the type of the variable. The measure (Ey (x)) is measure of interval, if we have a variable with ordered set of values and it is quantum of set, if we have a nominal variable (it is variable with finite non-ordering set of values and we have the pattern recognition problem). Clearly, the prediction quality is higher for those Ey whose measure is smaller (accuracy is higher) and the conditional probability P(y Ey (x) / x) (certainty) is larger. For a fixed strategy of nature c, we define an optimal decision function f (x) as such that F(c,f ) = supf F(c,f ), where is represented above class of decision functions. If the strategy of nature is unknown the sampling criterion F(f ) is used. When we solve this problem in practice the size of sample is very smaller and type of variables different. In this case is used class of logical decision function m complexity M [Lbov G.S., Starceva N.G, 1999].

XII-th International Conference "Knowledge - Dialogue - Solution" Properties of the Criterion For the fixed strategy of nature c the relation of the criterion F(c, f ) with probability of error Pf is shown.

Statement 1. For any strategy of nature c the quality criterion F(c, f ) is represented by risk function such that (Ey ), yEy 1- R(c,f ) = (1- L(y,f (x)))p(x,y)dxdy, where the loss function L(y,f ) such as L(y, f ) ={.

1+ (Ey ),yEy DX DY Remark that risk function R(c, f ) is probability of error Pf if the loss function L(y,f ) is indicator function.

k -Statement 2. For recognition k patterns by decision function f the quality criterion is F(c, f ) = - Pf.

k Consequence 1. For recognition two patterns we have equation F(c,f ) = - Pf.

Consequence 2. For pattern recognition problem the optimal decision function coincides with bayes function such as sup F(c,f ) = inf f.

x(-,+) x(-,+) Definition 1. Define a nature strategy cM (generated by logical decision function f M, f ~<, r() > ) such t t t t t as cM = {pt (x, y) = pxpy / x = P(x EX )P(y EY / x EX ),t = 1,...,M}, where 1) px = 1;

M t t =t t t t t t t t 2) P(EY /EX ) = py / x, 3) P(EY / EX ) = 1- py / x, where EX, EY r (), <,r() > M, t t t t t 4) AX E P(AX ) = px (AX ), AY EY P(AY /EX ) = py / x (AX ).

X t (Et ) (EY ) X t In the paper [Lbov G.S., Stupina T.A., 2002] is proved that F(c, f ) = px (py / x - (Ey )) for this nature M t t t =strategy. Let for k pattern recognition the domain DY is the set{1,...,k }.

Statement 3. Let the nature strategy ck for k pattern recognition is generated by logical decision function f i i such as f (x) = Ey for x EX, then the probability of error Pf for decision function f such as f (x) = i, k i i i i i Ey, for x EX is Pf = 1- pxpy / x.

i k (Ey ) i =k i i i Consequence 3. From the statement 3 it follows equation Pf + F(ck,f ) = 1+ px py / x 1- 1 - (Ey ).

i k (Ey ) i = Let us illustrate these statements. Let there is n=1, m=1, X- continuous variable, Y-nominal variable, M=2. The 2 nature strategy c2 is generated by f, that = {E1,EX }, E1 = (0.364, 1.0], E = [0.0, 0.364], X X X 2 2 2 19 r ( ) = {E1,Ey }, E1 = {1}, Ey = {1,0}, 1 ='1', 2 ='0', p1 =, px =, p1 / x = 0,95, py / x = 1. So we y y x y 50 have F(c2,f ) = 0,171. Obviously that c2 is such that conditional distribution p(x /{1}) and p(x /{0}) is intersected for every pattern, if we have f ( f (x) = {1}, if x E1, and f (x) = {0}, if x EX ) or f X ( f (x) = {1}, if x E1, and f (x) = {1}, if x EX ). Let calculate Pf using definition and compare with X 2 2 criterion F(c2,f ) : Pf = P({1})P(E /{1}) + P({0})P(E1 /{0}) = P(EX )P({1}/ EX ) + P(E1 )P({0}/ E1 ) X X X X 31 1 = 1+ (1- 0,95) = 0,329. Similarly we can provide the probability of error Pf. For this case we have 50 2 F(c2,f ) = - Pf = 0,5 - 0,329 = 0,171(statement 2 and 3).

Conclusion An approach to solving the problem of heterogeneous multivariate variable recognition with respect to the sample size was considered in this paper. The solution of this problem was assigned by means of presented criterion.

The universality of the logical decision function class with respect to presented criterion makes the possible to introduce a measure of distribution complexity and solve this problem for small sample size. For the nature strategy and the class of logical decision function the criterions properties are presented by means of statements and consequences for pattern recognition problem.

Decision Making Bibliography [Lbov G.S., Starceva N.G, 1999] Lbov G.S., Starceva N.G. Logical Decision Functions and Questions of Statistical Stability.

Inst. Of Mathematics, Novosibirsk.

[Lbov G.S., Stupina T.A., 2002] Lbov G.S., Stupina T.A. Performance criterion of prediction multivariate decision function.

Proc. of international conference "Artificial Intelligence", Alushta, pp.172-179.

[Vapnik V.N., Chervonenkis A.Ya, 1970] Vapnik V.N., Chervonenkis A.Ya. Theory of Pattern Recognition, Moscow: Nauka.

Authors Information Tatyana A. Stupina Institute of Mathematics SBRAS, Koptuga 4 St, Novosibirsk, 630090, Russia;

e-mail: stupina@math.nsc.ru ON THE QUALITY OF DECISION FUNCTIONS IN PATTERN RECOGNITION Vladimir Berikov Abstract: The problem of decision functions quality in pattern recognition is considered. An overview of the approaches to the solution of this problem is given. Within the Bayesian framework, we suggest an approach based on interval estimates on a finite set of events.

Keywords: Bayesian learning theory, generalization ability.

Introduction The problem of decision functions quality consists in need to find a decision function, not too distinguishing from the optimal decision function in the given family, provided that the probability distribution is unknown, and learning sample has limited size. Under optimal decision function we shall understand such function for which the risk (the expected losses of wrong forecasting for a new object) is minimal. In particular, the following questions should be solved at the analysis of the problem.

a) With what conditions the problem has a decision b) How the quality of decision function can be evaluated most exactly on learning sample c) What principles should be followed at the choice of decision function (in other words, what properties must possess a class of decision functions and learning algorithm) under the given sample size, dimensionality of variable space and other information on the task The principle possibility to decide the delivered problem is motivated by the following considerations. Firstly, for the majority of real tasks of forecasting on statistical information it is possible to expect a priori the existence of certain more or less stable mechanism being the basis of under study phenomena. Secondly, it is possible to expect that available empirical information in one or another degrees reflects the functioning of this unknown mechanism. Thirdly, it is required for the successful solution that the class of decision functions and learning algorithm should possess certain characteristics, on which will be said below.

Basic Approaches A number of different approaches to the solution of the problem can be formulated. In experimental approaches [1] (one-hold-out, bootstrap, cross-validation) the data set is divided on learning sample (for decision function finding) and test sample (for evaluation of quality of the decision function) repeatedly. The performance of the given method for decision function construction is then evaluated as the average quality on test samples. The shortcoming of this approach is its computational expensiveness.

XII-th International Conference "Knowledge - Dialogue - Solution" Probabilistic approach is based on the preliminary estimation of the distribution law. A learning problem can be solved if this law can be reconstructed from the empirical data. The given approach can be used only when sufficiently large a priori information on the distribution is available. For instance, it is possible to show that the problem of distribution density reconstruction from the empirical data is in general an ill-posed problem [2].

Two directions within the framework of this approach are known. The first one deals with the asymptotic properties of learning algorithms. In [3], the asymptotic evaluations of quality for such methods as a K-nearest neighbor rule, minimum Euclidean distance classifier, Fisher's linear discriminant etc are given.

The second direction takes into account the fact that the size of learning sample is limited. This approach uses the principles of multivariate statistical analysis [4] and restricts the set of probabilistic models for each class, for instance, assumes the Gauss distribution low. The determined types of decision functions (for instance, linear or quadratic; perceptron) are considered.

Vapnik and Chervonenkis [2] suggested an alternative approach to the solution of the given problem (''statistical learning theory''). This approach is distribution-free and can be applied to arbitrary types of decision functions.

The main question is ''when minimization of empirical risk leads to minimization of true unknown risk for arbitrary distribution''. The authors associate this question and the question of existence of the uniform convergence of frequencies to probabilities on the set of events related to the class of decision functions. The fundamental notions of growing function, entropy, VC-dimension that characterize the difficulty of decision functions class are suggested. It is proved that the frequency converges to probability uniformly if and only if the amount of entropy per element of sample converges to zero at the increase of sample length. As far as these evaluations are received for the worst case, on the one hand they are distribution-independent, but on the other hand give too pessimistic results. In [5] the notion of efficient VC-dimension is offered, which is dependent from distribution.

With this notion, the authors managed to perfect greatly the accuracy of evaluations.

Pages:     | 1 |   ...   | 44 | 45 || 47 | 48 |   ...   | 82 |

2011 www.dissers.ru -

, .
, , , , 1-2 .