After simple transformations of (9) we obtain the expression  Wh1(k)  Rh1(k)h1(k)wh (k) Wh (k) = (10) wh (k) that enables us to exclude the function from (1) and obtain the corrected estimates of the remaining parameters of the ANN. For this operation, we use only the information accumulated in the matrix Rh(k) and vector Fh(k). Using the same technique as above, we can obtain a procedure that can be used to add a new function to the existing basis. Direct application of the Frobenius formula [12] leads to the algorithm Rh (k) h(k) Fh (k) Wh+1(k) = Rh+1(k)Fh+1(k) = = T h (k) rh+1,h+1(k) fh+1(k) T h (k)Wh(k)  fh+1(k)  (11) (k) + Rh (k)h(k) Wh T rh+1h+1(k)  h (k)Rh (k)h(k) T  (k)Wh(k) + fh+1(k) T  rh+1,h+1(k)  h (k)Rh (k)h (k) where h(k) = (r1h+1(k),...,rhh+1(k))T = (rh+11(k),...,rh+1h(k))T. Thus, with the help of equation (11) we can add a new function (neuron) to the model (1), and exclude an existing function using the formula (10) without retraining remaining weights. In order to perform these operations in real time, it is necessary to accumulate the information about a larger number of basis functions than currently being used. E.g., we can initially introduce a redundant number of basis functions H and accumulate information in the matrix RH (k) and vector FH (k) as new data arrive, with only h < H basis functions being used for the description of the unknown mapping. The complexity of the model can be either reduced or increased as required. Analysis of equations (6), (10), and (11) shows that the efficiency of the proposed learning algorithm is directly h related to the condition number of the matrix Rh(k). This matrix will be nonsingular if the functions {i (.)}i=h used in the expansion (1) are linearindependent. The best situation is when the function system {i (.)}i=1 is orthogonal. In this case, the matrix Rh(k) becomes diagonal, the formulas (6), (10), and (11) being greatly simplified because 1 diag(a1,..,an )1 = diag,...,, (12) a1 an where diag(a1,..,an) is an (n n) matrix with nonzero elements a1,.., an only on the main diagonal. Simulation Results We have applied the proposed ontogenic network with orthogonal activation functions to online identification of a rat’s (Ratus Norvegius Vistar) brain activity during sleeping phase. The signal was measured with frequency of 64 Hz. We took a fragment of signal containing 3200 points (second of measuring), that was typical for sleeping phase of rat’s life activity. Two neural networks of type (1) were trained in realtime. Each network had 10 inputs – delayed signal values ( y(k), y(k 1),…, y(k  9) ) and was trained to output onestep ahead value of the process – y(k +1). First network utilized synaptic adaptation algorithm (6) while second one also involved the structure adaptation technique (10), (11). Initially both Neural and Growing Networks ANNs had 5 activation functions per input, the one with synaptic adaptation only retained all 50 tunable parameters during it’s work while ANN with structure adaptation mechanism had only 25 fired functions (the most significant ones chosen in realtime). For the results comparing purpose we also trained multilayer perceptron (further referred as MLP) with the same structure of inputs and training signal, having 5 units in the 1st and 4 in the 2nd hidden layers (that totals to 74 tunable parameters). As MLP is not capable of realtime data processing, all samples are used as training set and test criteria are calculated on the same data points. MLP was trained during 250 epochs with LevenbergMarquardt algorithm. Our research showed that this is enough to achieve precision comparable to proposed ontogenic neural network with orthogonal activation functions. Results of identification can be found in table 1. Fig. 1 shows the results of identification using proposed neural network. We used some different measures of identification quality. First, we analyse normalized root mean squared error, that is closely related to the learning criterion. Two other criteria used: “Wegstrecke” [19] characterizes the quality of the model for prediction/identification (+1 means perfect one), “Trefferquote” [20] is percent value of correctly predicted direction changes. 50 100 150 200 250 300 350 400 450 Figure 1. Identification of a rat’s brain activity during sleeping phase using proposed neural network with orthogonal activation functions – brain activity signal (solid line), network output (dashed line), and identification error (dashdot line) We can see that utilizing structure adaptation technique leads to somewhat worth results. This is the tradeoff for having less tunable parameters and possibility to process nonstationary signals. Table 1 – Identification results for different architectures Decription NRMSE Trefferquote Wegstrecke OrthoNN, realtime processing 0.1834 82.3851 0.OrthoNN, realtime processing, variable number of nodes 0.2187 77.6553 0.MLP, offline learning (250 epochs), error on the training set 0.1685 83.9533 0.Conclusion A new computationally efficient neural network with orthogonal activation functions was proposed. It has a simple and compact architecture not affected by the curse of dimensionality, and provides high precision of nonlinear dynamic system identification. An apparent advantage is much easier implementation and lower computational load as compared to the conventional neural network architectures. The approach presented in the paper can be used for nonlinear system modeling, control, and time series prediction. An interesting direction of further work is the use of the network with orthogonal activation functions as a part of hybrid multilayer architecture. Another possible application of proposed ontogenic neural network is its use as a basis for diagnostic systems. XIIth International Conference "Knowledge  Dialogue  Solution" References 1. Handbook of Neural Computation. IOP Publishing and Oxford University Press, 1997. 2. Nelles O. Nonlinear System Identification. Berlin, Springer, 2001. 3. Poggio T. and Girosi F. A Theory of Networks for Approximation and Learning. A.I. Memo No. 1140, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1989. 4. Platt J. A resource allocating network for function interpolation. Neural Computation, 3, 1991, p. 213225. 5. Nag A. and Ghosh J. Flexible resource allocating network for noisy data. In: Proc. SPIE Conf. on Applications and Science of Computational Intelligence, SPIE Proc. Vol. 3390, Orlando, Fl., April 1998, p. 551559. 6. Yingwei L., Sundararajan N. and Saratchandran P. Performance evaluation of a sequential minimal radial basis function (RBF) neural network learning algorithm. IEEE Trans. on Neural Networks, 9, 1998, p. 308318. 7. Fahlman S. E. and Lebiere C. The cascadecorrelation learning architecture. Technical Report CMUCS90100, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 1990. 8. Cun Y. L., Denker J. S., Solla S. A. Optimal Brain Damage. Advances in Neural Information Processing Systems, 2, 1990, p. 598605. 9. Hassibi B. and Stork D. G. Secondorder derivatives for network pruning: Optimal brain surgeon. In: Advances in Neural Information Processing Systems, Hanson et al. (Eds), 1993, p. 164171. 10. Prechelt L. Connection pruning with static and adaptive pruning schedules. Neurocomputing, 16, 1997, p. 4961. 11. Takagi T. and Sugeno M. Fuzzy identification of systems and its application to modeling and control. IEEE Trans. on System, Man and Cybernetics. 15, 1985, p. 116132. 12. Gantmacher F. R. The Theory of Matrices. Chelsea Publ. Comp., New York, 13. Narendra K. S. and Parthasarathy K. Identification and control of dynamic systems using neural networks. IEEE Trans. on Neural Networks, 1, 1990, p. 426. 14. Scott I. and Mulgrew B. “Orthonormal function neural network for nonlinear system modeling”. In: Proceedings of the International Conference on Neural Networks (ICNN96), June, 1996. 15. Patra J.C. and Kot A.C. Nonlinear dynamic system identification using Chebyshev functional link artificial neural network. IEEE Trans. on System, Man and Cybernetics – Part B, 32, 2002, p. 505511. 16. Bodyanskiy Ye.V., Kolodyazhniy V.V., and Slipchenko O.M. “Forecasting neural network with orthogonal activation functions” In: Proc. of 1st Int. conf. “Intelligent decisionmaking systems and information technologies”, Chernivtsi, Ukraine, 2004, p. 57. (in Russian) 17. Bateman, H., Erdelyi, A.: Higher Transcendental Functions. Vol.2. McGrawHill (1953) 18. Liying M., Khorasani K. Constructive Feedforward Neural Network Using Hermite Polinomial Activation Functions. IEEE Trans. On Neural Networks, 16, No. 4, 2005, p.821–833. 19. Baumann M. Nutzung neuronale Netze zur Prognose von Aktionkursen. – Report Nr. 2/96, TU Ilmenau, 1996. – 113 S. 20. Fueser K. Neuronale Neteze in der Finanzwirtshaft. – Wiesbanden: Gabler, 1995. – 437 S. Authors' Information Yevgeniy Bodyanskiy  Dr. Sc., Prof., Head of Control Systems Research Laboratory, Kharkiv National University of Radio Electronics, Lenin Av., 14, Kharkiv, 61166, Ukraine, email: bodya@kture.kharkov.ua Irina Pliss  Ph.D., Senior research scientists, Control Systems Research Laboratory, Kharkiv National University of Radio Electronics, Lenin Av., 14, Kharkiv, 61166, Ukraine, email: pliss@kture.kharkov.ua Oleksandr Slipchenko  Ph.D., Senior research scientists, Control Systems Research Laboratory, Kharkiv National University of Radio Electronics, Lenin Av., 14, Kharkiv, 61166, Ukraine, email: slipchenko@kture.kharkov.ua Neural and Growing Networks DISTRIBUTED REPRESENTATIONS IN CLASSIFICATION TASKS Ivan S. Misuno, Dmitri A. Rachkovskij, Sergey V. Slipchenko Abstract: Binary distributed representations of vector data (numerical, textual, visual) are investigated in classification tasks. A comparative analysis of results for various methods and tasks using artificial and realworld data is given. Keywords: Distributed representations, binary representations, coarse coding, classifiers, perceptron, SVM, RSC ACM Classification Keywords: C.1.3 Other Architecture Styles  Neural nets, I.2.6 Learning  Connectionism and neural nets, Induction, Parameter learning Introduction Classification tasks consist in assigning input data samples to one or more classes from a predefined set [1]. Classification in the inductive approach is realized on basis of training set that contains data samples with predefined class labels. Usually, input data samples are represented as numeric vectors. Vector elements are real numbers (e.g., some measurements of object characteristics or their function) or binary values (indicators of some features in the input data). This vector information often doesn’t contain information relevant to the classification explicitly, therefore some kind of transformation is necessary. We created methods for transformation of input information of various kinds (such as numerical [2], textual [3], visual [4]) to binary distributed representations. Those representations can then be classified by linear classifiers – such as SVM [5] or more computationally effective and naturally handling multiple classes perceptronlike ones [4, 6]. An objective of this research is the investigation of efficiency of the proposed methods for distributed information representation and classification using real and artificial data of different modalities. Numeric Vector Data Classification For an experimental research of abovementioned methods on numeric data the following wellknown test problems have been selected: LeonardKramer LK, XOR, Double Spiral [6]; datasets generated by DataGen [6]; and sample data from the Elena database [7]. The dimensionality A of data vectors varied from 2 to 36, number of classes C varied from 2 to 11, and the number of samples in the training and test sets varied from 75 to 3218. All selected problems have essentially nonlinear class boundaries. Therefore, nonlinear transformation of input numeric vectors has been used  i.e., RSC and Prager [2] methods of encoding. Those methods extract binary features – indicators of input Adimensional vector presence in sdimensional (s To investigate the impact of code parameters on the classification quality, the following experimental scheme was selected. Input vectors were converted to RSC and Prager codes. Those codes were used as input data for training and testing linear classifiers. The number (or percent) of test errors was chosen as a classification quality criterion. We used SVM [5] and modifications of perceptronlike classifiers [4] as linear classifiers for the obtained distributed representations. Besides, classification experiments with (nonlinear) kernel SVM using Prager, RSC [2] and standard (Gaussian and polynomial) kernels were conducted.
