Материалы XVI Международной конференции по нейрокибернетике FACIAL IMAGE FEATURES FOR HEAD POSE ESTIMATION S. Anishchenko1,2, A. Labantcev1, I. ShepelevA.B.Kogan Research Institute for Neurocybernetics, Southern Federal University, Rostov-on-Don, Russia, email@example.com School of Engineering and Information Sciences, Middlesex University, London, NW4 4BT, UK.
In computer vision context the head pose estimation is the process of extraction of the information about the head pose from an image of the face. Most common approach is to extract features from a facial image and predict pose using machine learning tools. In this research five different features was evaluated and compared to be used for the head pose prediction with the multilayer perceptron.
Introduction Head pose estimation is the common task for many applications, such as humanmachine interaction, biometry, medical image processing, operator fatigue estimation etc.
Many approaches have been proposed to estimate head pose based on facial images .
In general it can fall into two major groups:
feature-based and model-based.
Fig. 1. An example of the frames with the same labels The first group algorithms normally in ground truth. On the top the first frame of Jal3 is consist of two steps: facial feature extraction shown; on the bottom – the first frame of Jal8. It is and head pose prediction using machine obvious that head pose is different on the frames while labels in ground truth indicating same poses.
learning tools. It is important to use features which are invariant to various parameters To ensure that ground truth is correct the (such us illuminance level etc.) and allow to video and labels was preprocessed in the gather higher accuracy. This research is following manner. The frames where same devoted to the evaluation and comparison of head pose is indicated by labels were the five different facial images feature for the compared. It was revealed that ground truth task of head pose estimation. The multilayer should be corrected because it indicates same perceptron was used for construction of pose on the frames which are actually mapping between the image features and the different (Fig. 1).
head pose. The head pose was characterized To compute correction coefficient the by three angles: roll, yaw and pitch.
most similar facial frames was detected on sequences. For example on the sequence Video database named in database jal3 and jal8 the frames number 2 and 103 respectively was detected The public available video database with as the most similar (Fig. 2). Ground truth for ground truth indicating head pose on each those frames was (-0,593; -1,384; -1,011) frame  was used in this research. The only and (12,041 ; 3,098; 1,384) respectively.
one person’ video sequences (n=6) was Thus, labels of sequence jal8 should be considered (the name of video in database is corrected by adding (-12,634; -4,482; -2,395) jal[sequence number], see Fig. 1). The facial to the roll, yaw and pitch respectively. The feature points were marked up similarity between frames was computed by manually (Fig. 3.).
4-Й МЕЖДУНАРОДНЫЙ СИМПОЗИУМ «НЕЙРОИНФОРМАТИКА И НЕЙРОКОМПЬЮТЕРЫ» comparing angles between lines connected facial landmarks.
Fig. 3. Facial landmarks (top); and its scheme with shown landmarks (1, 2, 3) and angles (,,,,,,, ) used as feature description of the face pose (bottom).
Fig. 2. The top and bottom photo is the most similar frames from video sequences jal3 (frame number 2) and The second type of the feature is the jal8 (frame number 103) respectively. It is obvious that head is in the same pose on the both frames while HOG, computed inside the circumscribed ground truth indicates different pose.
rectangle of the facial landmarks. Gradient direction was quantized with step 22.5o, thus, To produce training and test set for this the result feature vector dimension was 16.
research the most similar frames was detected Further, the rectangle was divided by 4x4 grid on each clip. Then ground truth was corrected and HOG was computed in each cell.
to indicate same head pose on that frames.
Histograms then were combined in one After that procedure the head pose in the MultiHOG (MHOG), respectively the overall video sequences varied in the ranges dimension of this feature vector was 16*4=64.
shown in Table 1.
Next two feature vectors were achieved in the same way as HOG and MHOG but Table 1. Ranges of angles in the ground truth.
region of interest was detected by another Interval method. Particular the colour segmentation [-18,9; 23,2] Roll algorithm described earlier in  was used.
[-16,3; 31,1] Yaw The features called CHOG and CMHOG.
[-27,5; 17,1] Pitch Described feature vectors were extracted from each frame (n=732) and, further, along Facial features with ground truth were used for training and testing artificial neural network. The precision Five sets of features for the face pose of pose prediction was analyzed to evaluate prediction were evaluated. The first one is the features.
set of angles (n=8) between lines connected facial landmarks (eyes corners, nose tip, nose Neural network model basement, see Fig. 2). All others features are based on Histogram of Oriented Gradients The multilayer perceptron (MLP) with (HOG), but computed in the different ways.
one hidden layer  was used for prediction of the head pose. The number of the inputs of the neural network was varied according to the dimension of the feature space being tested in Материалы XVI Международной конференции по нейрокибернетике the computational experiments. Since each of previous vectors (i.e. delay line), which allow the head angle was predicted by the separate to reach the desired accuracy of training was neural network, the output was the only one founded. Since the delay line was equal to neuron. The number of neurons in the hidden nine, thus, the number of neural network layer was fixed for all computational inputs was 8*9=72.
experiments and equal to 24. Backpropagation Table 2. Cross-validation results.
algorithm was used for training. The stopping Training Test Test criterion was the follows. The network was Spherical accuracy, accuracy, accuracy, trained until an error percentage e of head coordinates % % degree angle prediction for all training exemplars is Feature: angles between lines connected facial less or equal to 10:
landmarks (dimension – 8, dimension for time series representation (delay line=9) - 72).
roll 96.3 94.6 2.e estop, yaw 94.5 92.9 3.pitch 96.1 94.5 2.where Feature: HOG (dimension - 16) roll 96.4 94.9 2.d y - y yaw 95.6 92.1 3.e = 100%, (1) pitch 96.5 95.1 2.ymax - ymin Feature: MHOG (dimension - 64) estop = 10, roll 96.5 94.7 2. yaw 95.8 94.3 2.pitch 96.6 95.3 2.y is the actual output of the network corresponding to one of head angle to be Feature: CHOG (dimension - 16) predicted, yd is the desired output, i.e. correct roll 96.7 95.9 1.value, ymax is the maximum value of the angle yaw 96 93.7 2.and ymin is the minimum one.
pitch 96.4 95.6 1.Feature: CMHOG (dimension - 64) Computational experiments results roll 96.8 96.2 1.yaw 96.7 94.3 2.In the computational experiments the five pitch 96.8 95.9 1.groups of features to predict three angles of head rotation was tested. 10-fold crossConclusion validation technique was used and the results are presented in Table 2.
In this research a set of five feature In the columns of the table the averaged vectors for head pose prediction was evaluated values of the training and test accuracy are with MLP using cross-validation technique.
shown. It was computed using Eq. 1.
The first type of feature was the set of angles The feature space dimension is specified between lines connected facial landmarks. To in the Table 2 and it defines the number of reach desired prediction accuracy it was neural network inputs directly. An exception represented as time series. All other features was the first features. Since the MLP could were based on HOG.
not reach the desired accuracy of prediction on The NN was trained until maximum error training set, i.e. the stopping criteria could not in the training set reached 10%. Then been satisfied, the dimension of vector was averaged error was analyzed (Table 2.).
changed in the following ways. Taking into The results show that yaw angle is the account that training cases are statistically most difficult for prediction. Based on the test dependent, for this features the training set accuracy of this angle we conclude that was represented as time series. I.e. to process MHOG and CMHOG features outperform the current feature vector the previous ones can be others. Comparing prediction accuracy for all also taken into account to achieve more of three angles it can be concluded that reliable prediction. The minimum number of 4-Й МЕЖДУНАРОДНЫЙ СИМПОЗИУМ «НЕЙРОИНФОРМАТИКА И НЕЙРОКОМПЬЮТЕРЫ» survey”. Pattern Analysis and Machine Intelligence, CMHOG is the most suitable feature for head IEEE Transactions on, vol. 31, issue 4, April 2009:
The performance of all HOG-based 2. Bishop C.M. Neural Networks for Pattern features was better than facial landmark-based Recognition. Oxford University Press. 1995.
one because the image resolution is small and 3. S. Anishenko, D. Shaposhnikov, R. Comley, X.
Gao. A colour based approach for face segmentation distance between some facial landmarks is less from video images under low luminance levels. // In than 5 px, therefore, small changes in head Proc. of the 11th IASTED International Conference on pose are not reflected distinctively in the Computer Graphics and Imaging (CGIM 2010) Feb. 17landmarks position on the images.
19, 2010, Innsbruck, Austria. - pp. 184-189.
Acknowledgments: The work is 4. Cascia E. L., Sclaroff S., Athitsos V. Fast, reliable head tracking under varying illumination: An approach supported by the Russian Foundation for Basic based on registration of texture-mapped 3d models. // Research, grant 11-01-00750a.
Pattern Analysis and Machine Intelligence, 22(4), 2000.
References 1. Erik Murphy-Chutorian and Mohan Manubhai Trivedi, “Head pose estimation in computer vision: A Материалы XVI Международной конференции по нейрокибернетике ELECTROPHYSIOLOGICAL PROPERTIES OF VISUAL CORTEX NEURONS RECORDED EXTRACELLULAR IN CAT'S BRAIN IN RESPONSE TO VISUAL STIMULI E.I. Belova1, I.A. Ischenko1, R.A. Tikidji-HamburyanA.B.Kogan Research Institute of Neurocybernetics, Southern Federal University, Rostov-on-Don, Russia Neuroscience Center, Louisiana State University, New Orleans, LA, USA firstname.lastname@example.org, email@example.com Knowledge about electrophysiological properties is of chattering cells). However, extracellular great importance for identification of neurons and for recording in response to natural visual stimuli understanding of their relationships in neocortical in behavioral experiments is useful for circuits. In the intracellular recording studies four studying and modeling neuronal networks.
electrophysiological classes of neurons are identified on Therefore, the goal of the present study is to the basis of intrinsic membrane properties and discharge pattern in response to the current pulses.
create database of cat’s visual cortex neurons Nevertheless, multichannel extracellular recording in recorded extracellular in response to natural behavioral experiments are useful for study and visual stimuli for subsequent modeling of modeling of neuronal networks. In our research we used excitatory-inhibitory neuronal networks. We the technique of extracellular recording to identify have attempted to identify and classify neurons with distinct width of action potential waveform, and firing pattern in response to visual neurons with respect to known stimuli in respect to electrophysiological classes they electrophysiological classes.
Methods The identification of different classes of cortical neurons is of great importance to All experiments were performed on develop a cellular and network level anesthetized female adult cats. Initial surgery understanding of functioning of the cerebral was performed under anesthesia (ventrankvil cortex. The identification is possible on the 1% 0.4 ml + rometar 4% 0.2 ml + zoletil 5% base of neuron characteristics, such as:
0.2 ml) using aseptic procedures to place steel laminar location of the cell body, morphology frame on the cat’s head for holding fixed of neurons, neurotransmitters contained in, during subsequent recording. The animals particular receptors, receptive field were given 10 days to recover from the effects characteristics and electrophysiological of surgery before electrophysiological properties. Electrophysiological properties of recording was begun.
neuron determine firing pattern in response to Recordings were performed from visual stimuli and cell to cell interaction, thus anesthetized subjects using procedures playing a crucial role in the behavior of local approved by the Southern Federal University circuit networks in general. The pioneering Ethical Committee. Microelectrodes for work in the definition of electrophysiological recording were constructed from polyamideclasses of neocortical cells was done in brain insulated, platinum-iridium filaments, 50 m slices in vitro [1, 2]. Then a massive number in width and sharpened to a fine tip.
of reports appeared with intracellular Microelectrodes had impedances of 2-5 M.
recording in vivo. It allows giving a more Recording were obtained, during subsequent precise detailed classification of neocortical recording session, through 4 mm holes drilled cells on the basis of their discharge pattern and through the bone over the region of V1 (AP: intrinsic membrane properties [3, 4]. In 3; L - +8). Spike data were acquired using a response to intracellular injection of Plexon (Dallas, TX) data acquisition system, depolarizing current pulses, four distinct types filtered at 154 Hz to 8.8 kHz, and sampled of firing patterns were observed among continuously at 40 kHz.
Материалы этого сайта размещены для ознакомления, все права принадлежат их авторам.
Если Вы не согласны с тем, что Ваш материал размещён на этом сайте, пожалуйста, напишите нам, мы в течении 1-2 рабочих дней удалим его.