Pages:     | 1 |   ...   | 73 | 74 || 76 | 77 |   ...   | 82 |

[2] G.Coles, T.Coles, V.A.Lovitskii, Natural Interface Language, Proc. of the VIII-th International Conference on Knowledge-Dialogue-Solution: KDS-99, Kacivelli (Ukraine), 104 -109, 1999.

[3] D.Burns, R.Fallon, P.Lewis, V.Lovitskii, S.Owen, Verbal Dialogue versus Written Dialogue*, Proc. of the XI-th International Joint Conference on Knowledge-Dialogue-Solution: KDS-2005, Varna (Bulgaria), 336-244, 2005.

[4] T.Coles, V.A.Lovitskii, Text Searching and Mining, J. of Artificial Intelligence, National Academy of Sciences of Ukraine, Vol. 3, 488-496, 2000.

[5] L.Huang, T.Ulrich, M.Hemmje, E.Neuhold, Adaptively Constructing the Query Interface for Meta Search Engines, Proc. of the Intelligent User Interface Conf., 2001.

[6] Harris R.J. Monaco G.E., Psychology of Pragmatic Implication: Information Processing between the Lines, Journal Exp. Psychol. General, 107, 1978, pp.1-22.

[7] Kitano H., Challenges of Massive Parallelism, Proc. Of the 13th International Joint Conference on Artificial Intelligence (IJCAI-93), Vol. 1, 1993, pp.813-834.

322 Intelligent Systems Authors Information Guy Francis 2 Ergo Ltd, St. Marys Chambers, Haslingden Road, Rawtenstall, Lancashire, BB4 6QX, UK, e-mail: guy.francis@2ergo.com Mark Liashman 2 Ergo Ltd, St. Marys Chambers, Haslingden Road, Rawtenstall, Lancashire, BB4 6QX, UK, e-mail: mark.liashman@2ergo.com Vladimir Lovitskii 2 Ergo Ltd, St. Marys Chambers, Haslingden Road, Rawtenstall, Lancashire, BB4 6QX, UK, e-mail: vladimir@2ergo.com Michael Thrasher University of Plymouth, Plymouth, Devon, PL4 6DX, UK e-mail: mthrasher@plymouth.ac.uk David Traynor 2 Ergo Ltd, St. Marys Chambers, Haslingden Road, Rawtenstall, Lancashire, BB4 6QX, UK, e-mail: david.traynor@2ergo.com DISTINCTIVE FEATURES OF MOBILE MESSAGES PROCESSING Ken Braithwaite, Mark Lishman, Vladimir Lovitskii, David Traynor Abstract: Worlds mobile market pushes past 2 billion lines in 2005. Success in these competitive markets requires operational excellence with product and service innovation to improve the mobile performance. Mobile users very often prefer to send a mobile instant message or text messages rather than talking on a mobile. Well developed written speech analysis does not work not only with verbal speech but also with mobile text messages. The main purpose of our paper is, firstly, to highlight the problems of mobile text messages processing and, secondly, to show the possible ways of solving these problems.

Keywords: mobile text messages, text message analysis, natural language processing Introduction 1. The reasons why is very difficult to use the classical linguistic approach for verbal speech analysis have been considered in [1]. In this paper the problems of Mobile Short Message (MSM) analysis will be discussed. MSM represents plain text message of 160 characters or less and provided by mobile SMS (short message service). The year 2005 saw an explosion in the volume of MSM being sent to mobile phones. Mobiles users choose to send MSM rather than talking on a mobile call because [2]:

They dont have time to chat phone (74%).

To not disturb other patrons on public transportation or at a sporting event or restaurant (53%).

To get work done and send quick notes when on the road travelling for business (32%).

Less disturbing than phone calls (72.5%).

One can reach the other party around the clock (30.4%).

However, mobile operators need to understand that subscribers give greater priority to the convenience of using the service over the technology and capabilities it offers. Therefore, more effort must be placed on creating user-friendly client interfaces that integrate effectively with the handset features.

2. A wide variety of information services can be provided by SMS, including weather reports, traffic information, inventory management, itinerary confirmation, sales order processing, asset tracking, automatic vehicle location, entertainment information (e.g., cinema, theatre, concerts), financial information (e.g., stock quotes, exchange rates, banking, brokerage services), and directory assistance. SMS can support both push (i.e.

mobile-terminated (MT)) SM and pull (i.e. mobile-originated (MO)) SM to allow not only delivery under specific conditions but also delivery on demand, as a response to a request.

XII-th International Conference "Knowledge - Dialogue - Solution" 3. The important distinctive feature of MSM is that the majority of them are bilingual (i.e., using both English words and mobile slang from Tegics T9 dictionary [3]).

4. We will consider MSM in indissoluble link with Inbound Number (INo) represented by a short code (it is typically a 5 digit number which is accessible by subscribers of any mobile operator) or long code (a usual mobile number works across all operators).

5. Information services as described above are provided by Content Providers who must rent an INo. This can be dedicated to provide a single service or shared to provide multiple services. In they case of multiple services, they are distinguished by the use of a key word that user must provide as the first word of the MSM.

6. The standard 12-key keypad found on many mobile phones today (see Figure 1). On this Figure Imitator of Mobile is represented. Alphabetic letters are mapped to keys 2 through 9. However, this arrangement poses problems for text entry. As three or four letters share the same key, some form of disambiguation is required to determine which letter is intended by the user. There are currently two main methods that are usually used on mobile phones for text entry. They are the multi-tap method and the predictive text entry method. In the multi-tap method, a user taps the key that contains the letter repeatedly until the desired letter appears. The number of taps required depends on the position of the letter on the key. In predictive text input method (e.g., Tegics T9 [3]), the user presses the key that corresponds to each letter of a word once. The system uses a dictionary of words to determine which of the possible words the key sequence matches. When MSM is received on a particular INo, then for a dedicated INo the MSM is forwarded to the client renting it. If the INo is shared, the MSM needs to be examined to identify the client and the individual service.

7. First we will describe the types of MSM and the problems encountered examining the MSM. The MSM might be represented by:

Letter or digit. For example, a number of promotions are quizzes/competitions and sometimes are also interactive, i.e., multiple messages/responses. If the original message to the customer is a question, such as How many legs has my dog got then the customer could reply 1, 2, 3, or 4. Some promotions are multichoice answers e.g., a, b, or c.

Single word or number (e.g. credit card number).

Sequence of words or numbers.

Combination of words and numbers in MSM.

The main purpose of this paper is to investigate the bad pairs INo MSM and find ways to restore them.

Lets call pair INo MSM bad if:

INo does not exist;

Type of MSM was not recognised or keyword of MSM was not recognised. Very often the first whitespace-delimited word represents keyword (KW) and allows the identification of the client;

The pair INo MSM does not exist because (INo & MSM) (INo & MSM), Figure 1. Standard 12-keys keypad where INo and MSM stand for wrong INo and wrong MSM respectively. Lets call INo and MSM wrong if they separately exist but link between INo and KW of MSM does not. The reason of wrong MSM is understandable. For example, a user can tap the 2-key once to get a, twice to get b and thrice to get c. If he taped wrongly then instead of desired word bell he typed cell, or using 6-key instead of come was cone.

A special type of MSM (so called stop MSM) requires synonyms for recognition e.g., cancel, remove, etc.

Finally, we would like to underline the most difficult and dangerous problem when INo MSM exists but 324 Intelligent Systems ((INoT INoD) & (KWT = KWD)) ((INoT = INoD) & (KWT KWD)) ((INoT INoD) & (KWT KWD)), where letters D and T mean what user desired to type and what was actually typed.

This problem takes place because of ambiguity of both INo and KW i.e., one INo might link to several KW and many different INo might use the same KW, and vice versa.

Lets investigate these problems and discuss the results of KW, INo and bad MSM analysis. Our investigation was grounded in real data analysis. As a result of this discussion an algorithm to deduce the correct KW from a bad MSM will be described. Also, the result of using of this algorithm will be shown.

Keywords Analysis The result of KW analysis and KW ambiguity is shown on Figure 2, namely:

- Total (valid + invalid) KW distribution among letters and mobiles keys (2-9). A KW is invalid if it currently is not used on the INo but at the same time the same KW might be valid for another INo. For example, KW red is valid for INo 81025 and 80039, and invalid for 89095;

- Displaying the list of KW for selected letter or Inbound No by clicking the corresponding letter or digit;

- For any KW (by clicking when the list of KW is displayed, or just simply typing in KW) the corresponding list of INo is displayed.;

- List of the next (= expected) symbols is displayed for the entered symbol (letter or digit);

- List of ambiguity for both valid and invalid KW is displayed.

Figure 2. Keywords analysis and KW ambiguity XII-th International Conference "Knowledge - Dialogue - Solution" INo ambiguity is shown on Figure 3.

Figure 3. INo ambiguity To provide such analysis the Knowledge Base (KB) has been created and used for KW, INo and bad MSM analysis, and KW and INo restoration. The main features of KB have been discussed in details in [4]. Here we would like to notice that in our case, under the KB organisation we would understand the regularity of data (INo and KW) distribution in memory assuring the storage of various links between them. At any time KB deals only with relatively small fragments of the external world. So, the corresponding structures are needed to integrate these fragments separated in time into the integral picture. The structures obtained as a result of integration should contain more information than it had been used for its creation. The organisation of KB should make allowance for such features as:

- associability;

- ability to reflect similar features for different objects and different features for similar objects (where objects are represented by KW and INo);

- heterarchical organisation of information [5]. The idea of heterarchical approach means that a full association of INo and KW represent very complicated net of nodes and unidirectional links between them. The predetermined hierarchy of "super-" and "subclasses" is absent; every node (INo or KW) is a "patriarch" in its own hierarchy if some process of search initiates with it.

Bad Messages Analysis The main purpose of Bad Messages (BdM) is to classify BdM and allocate types of BdMs which might be restored. Several hundred thousand BdMs have been detected and result of this is as follows:

Wrong KW among valid and invalid KW - 42.12%;

Wrong KW among valid KW - 20.11%;

Wrong KW among invalid KW - 22.01%;

Wrong INo - 39.53%;

Stop MSM - 8.78%;

Empty MSM - 6.47%;

Wrong alphabet (e.g. Russian) - 2.65%;

Mobile slang (from T9 dictionary) - 0.37%;

Rude MSM - 0.08%.

Remark: Wrong INo means literally wrong INo, e.g. 22120000, or unknown INo. So despite that 39.53% of wrong INo it would not be effective to spend more effort in trying to decrease this percentage. In the next session of paper some ideas of KW and right INo restoration will be discussed.

326 Intelligent Systems Algorithm of KW and/or INo Restoration 1 INo recognition. There are four possible type of INo: (i) valid; (ii) invalid, (iii) unknown when either length of INo is different from short or long INo, or INo does not exist in KB. Remark: Checking existing INo in KB would be sufficient to find out if the INo is known or not. But this operation requires more time than simply checking the length of the INo, and (iv) wrong INo. Initial analysis of INo does not allow the identification of this type of INo. It would only be possible to do this when KW of the MSM is recognised.

2 Initial MSM validation. MSM will be classified as valid if only contains symbols from the Latin alphabet and/or digits are used. Hereafter, only valid MSM will be considered.

3 Separators elimination from MSM.

4 Fillers elimination from MSM. For example, in MSM: Id like to stop sending messages Id like to is a filler and will be deleted.

5 Slang elimination from MSM using T9 dictionary.

6 Stop MSM recognition. Remark: In the current version of algorithm MSM s t o p will not be recognised as a stop MSM.

7 Extracting set of KW from KB related to INo, i.e. {KW }, where {KW } {KW }. {KW } represents all INo INo KB KB existing KW in KB.

8 Extracting KW from MSM, i.e. KW Remark: In the current version of the algorithm only the first word of M.

MSM is considered as a KW M.

9 Extracting set of INo from KB related to KW, i.e. {INo }, where {INo } {INo }.

M KWm KWm KB 10 Pair INo MSM is accepted if ((INo {INo } KW {KW }) IS-Correct(MSM)) return(KW ), KWm M INo M where predicate IS-Correct(MSM) is true when MSM is correct and false (i.e. IS-Correct(MSM)) - otherwise. Symbol stands for word then and symbol means lead to. Returned KW is used for further M analysis.

11 Pair INo MSM represents BdM, if (INo{INo } KW {KW }) (INo{INo } KW {KW }) (INo{INo } KW {KW }), KWm M INo KWm M INo KWm M INo where symbol means exclusive or.

12 After recognition of BdM reason, the attempt to restore BdM is undertaken. To explain this step let us assume that the reason of BdM is:

INo{INo } KW {KW }.

KWm M INo From this it follows that:

INo{INo } KW {KW } (KW {KW } KW {KW }).

KWm M INo M KB M KB If KW {KW } then attempts to correct INo should be undertaken. The next step will describe the more M KB complicated case of KW correction when KW {KW }.

M M KB 13 KW correction. There are two different approaches to restore KW :

M M (1) The first approach provides searching KW {KW } under several conditions:

i KB the difference in length of words KW and KW must be less or equal 1;

Pages:     | 1 |   ...   | 73 | 74 || 76 | 77 |   ...   | 82 |

2011 www.dissers.ru -

, .
, , , , 1-2 .