Pages:     | 1 |   ...   | 72 | 73 || 75 | 76 |   ...   | 82 |

It is simply impossible to require the users to know the exact values in DB (e.g. name of constituency in Election AD) in order to ask correctly what is a very simple question: Who won the election in Suffolk Central & Ipswich North in 2001. For example, if the user instead of using the symbol & instead types in and IDAS will not find the constituency in DB. But IDAS is an intelligent system and in the result of NLE analysis IDAS understand that user possibly mentioned two different constituencies Suffolk Central and Ipswich North but both of them incorrectly because there also exists Suffolk Coastal, Suffolk South, Suffolk West and Ipswich constituencies. Clarification dialog generated by IDAS irritates user:

IDAS: Do you mean Suffolk Coastal, Suffolk South, or Suffolk West constituency User: No, I mean Suffolk Central.

IDAS: Suffolk Central constituency does not exist but there is Suffolk Central & Ipswich North constituency.

User: Its exactly what I meant.

IDAS: Thank you.

Very often NLE is ungrammatical.

Direct observation of user NLE shows that all users are lazy i.e. they want to achieve the desired result by using minimum effort. They do not want to type in the long NLE such as Identify the parts supplied by each vendor and the cost and sales value of all these items at present on order. This is natural behaviour of human being in accordance with the principle of simplicity, or Occams razor principle (Occam's (or Ockham's) razor is a principle attributed to the 14th century logician and Franciscan friar; William of Occam.

Ockham was the village in the English county of Surrey where he was born). The principle states that Everything should be made as simple as possible, but not simpler (The final word is of unknown origin, although it's often attributed to Einstein, himself a master of the quotable one liner). Finding a balance between simplicity and sophistication at the input side has been discussed in [5].

Thus, firstly, NLE does not necessarily mean the enquiry is in plain English, secondly, IDAS should provide different levels of simplicity for NLE. The first step in this direction is NLET.

Natural Language Enquiry Template combines a list of values to be selected when required and generalization of users NLEs. Examples of some Frequently Asked Questions (FAQ) in AD Election are shown below:

What was the result in How many votes did win in Which party won the election in Who won an election in Initial set of FAQ has been created by export in AD Election but in the result of activities new NLE have been collected by IDAS, analysed, generalized and then added to FAQ.

When the user selects an appropriate NLET with some descriptor in angular brackets IDAS immediately displays the list of corresponding values. As soon as the user finds the demand value by simply starting to type it and press button result will be displayed (see Figure 1).

At first glance, the NLET is an ideal way to communicate with AD but in reality there are some problems, which need to be solved to provide lightness of communication. To highlight such problems is enough to consider quite a simple NLET: Who won an election in . Without knowing who is who and meaning of won election IDAS cannot answer this question. To explain it to IDAS the Production Rules (PR) need to be involved.

Many researchers are investigating how to reduce the difficulty of moving a NLI from one AD to another. The problems in doing this include what information is needed and how the information needs to be represented.

From our point of view, Preconditioned PR (PPR) is a quite powerful approach to solve this problem. The subset of PPR in format: is shown below.

XII-th International Conference "Knowledge - Dialogue - Solution" 1. AD:Election who candidate;

2. AD:Election [candidate]: [SQL]:;

3. AD:Athletics [runner]: [SQL]:

4. AD:Athletics [shooter]: [SQL]:

5. AD:Election & DB:MS Access votes [Field]:;

6. AD:Election & DB:MS Access candidate [Field]:;

7. AD:Election & DB:Oracle [party]: [SQL]:;

8. AD:Election & DB:MS Access [party]: [SQL]:, where - denotes exclusive OR. Precondition consist of class :value {& class :value }. Antecedent might 1 1 i i be represented by: (i) single word (e.g. who, won, August, seven, etc.), (ii) sequence of words (e.g. as soon as, create KB, How are you doing, etc.), or (iii) pair - [context]:. Context allows one to avoid word ambiguity and thereby distinguish difference between Candidate won an election and Party won an election.

Presentation of Consequent is similar to Antecedent structure except (iii). For Consequent pair represents [descriptor]:.

Figure 1. Natural Language Enquiry Template For AD Election subset (1, 2, 5..8) of PPR is used. PPR 3 and 4 in fact show another meaning of the same word won but for a different AD. The last two PPR show the simplest way to cover the difference in SQL for different DB. Result of parsing considered NLET using selected PPR is shown on Figure 2.

Thus, NLET allows the user to be lazy but requires great effort to create the proper set of PPR as part of KB to describe better the more meaningful words. But using NLE and NLET we cannot say that all meaningful words have been described even for quite restricted AD. As a result some users will be disappointed by the IDAS reply.

ED is a step in the direction towards simplifying KB and increasing the reliability of IDAS.

Before moving to ED it would be sensible once more to address some NLE and NLET problems. The cognitive process of understanding is itself not understood. First, we must ask: What it means to understand a NLE. The usual answer to that question is to model its meaning. But this answer just generates another question: What does meaning means. The meaning of a NLE depends not only on the things it describes, explicitly and implicitly, but also on both aspects of its causality: What caused it to be said and What result is intended by saying it. In other words, the meaning of a NLE depends not only on the sentence itself, but also on Who is asking the question and How the question is phrased.

318 Intelligent Systems Figure 2. Natural Language Enquiry Parsing From the linguistics point of view the process of understanding is possible under the following, as a minimum, three conditions [6]:

IDAS must comprehend and understand separate words but lexical ambiguity sometimes makes such understanding difficult. A classic example of lexical ambiguity is the sentence: Time flies like an arrow. Each of the first three words could be the main verb of the sentence, and time could be a noun or an adjective, flies could be a noun, and like could be a preposition. Thus, the sentence could have various interpretations other than the accepted proverbial one. It could, for example, be interpreted as a command to an experimenter to perform temporal measurements on flies in the same way they are done on arrows. Or it could be a declaration that a certain species of fly has affection for a certain arrow.

IDAS must understand the structure of the whole sentence but sometimes that is not a simple matter. If we have an ambiguous phrase such as: John saw the woman in the park with a telescope, then we usually understand one meaning and ignore the alternative interpretations.

An empirical study revealed that only 0.53% of possible sentences considered being grammatical are actually produced [7, p.823]. Note that the capacity to cope with ungrammatical NLE is one of the important requirements of NLE processing.

For artificial system like IDAS the power of natural language to describe the same events in different ways is a great problem. For example, the primitive event: Delete a cursor from the screen might be described as:

eliminate a cursor, get rid of a cursor, remove a cursor from the screen, erase a cursor, makes a cursor hidden, set the cursor size to 0, take away a cursor from the screen, etc. Therefore the ED might release IDAS from such problems.

Enquiry Descriptors is especially useful when AD is not simple (e.g. AD Mobile Messages on Figure 3). And another important point of using ED is that modern technology has completely changed the way that people use the telephone to exchange dialogue with information held on computers. Well developed written speech analysis does not work with verbal speech [3]. For example, the first step of Speech Recogniser to parse NLE XII-th International Conference "Knowledge - Dialogue - Solution" Im looking for address of insurance company in Bolton will be filler deleting i.e. Im looking for. Finally, initial NLE will be represented as a set of descriptors, which represent the NL description of meaningful fields of AD.

Figure 3. AD Mobile Messages and example of Enquiry Descriptors The definition of meaningful fields depends on AD objectives. For the considered AD Mobile Messages is a list of descriptors: {company, account, network, etc.}. Between descriptors and meaningful fields exist one-to-one attitude. The procedure for creating ED is very simple (see Figure 3):

Select desirable descriptors. In the result of selection the corresponding

. (Descriptor) will be displayed;

Select field, value for which needs to be assigned, enter value in square brackets and press . For descriptor Date value [February] was defined;

If some mathematical function need to be involved press corresponding button. To summarize all delivered messages button has been clicked for selected descriptor Delivered;

Click button SQL to convert ED to SQL-query.

If objectives of using AD changes then set of descriptors need to be extended, which requires effort of KB administrator. But this is the simplest way of extracting data from AD using IE.

Immediate Enquiry is useful for users who are familiar with AD structure and know the meaning of tables and their fields. To create IE, firstly, select table, secondly, select desirable field (see Figure 3). Pair

. will be displayed. Now user can add a descriptor and do the same procedure as for ED.

Natural Users Enquiry to SQL Query Conversion The steps of NLE to SQL query are well defined [1]: (NLE NLET) ED IE SQL-query. The final step is quiet complicated because the necessity to access data from many different tables within an AD and join those tables together in a report needs to be implemented. This is extremely important because non-technical users do not know how to join tables to get a more comprehensive view of their data. Quite often a very simple question in English can turn into a very complicated SQL-query e.g. conversion of NLE Display all messages amount for all networks in the last month gives SQL-query shown on Figure 4.

320 Intelligent Systems Figure 4. Result of NLE to SQL-query conversion Even the simplest ED like White thick sliced bread cannot be directly converted to SQL-query because ADs data might contain any combination of wrong and correct words and, therefore, four PPR (white wht , thick thk , sliced slcd sld, and bread brd) is required [3]. Theoretically, for the considered example there are 16 possible combinations of data, namely: (1) White thick sliced bread, (2) White thick sliced brd, , (16) wht thk (slcd sld) brd. Result of such conversion is shown in Figure 5.

Figure 5. ED to SQL-query conversion using PPR The idea of joining tables in SQL is that individual rows in one table are attached to some corresponding rows in another table. The criteria for joining rows are decided by the highly skill SQL user. IDAS provides automatic Tables Coupling (TC). The main problem of TC is to select the right tables link from a huge number of possible links. Result of conversion ED from Figure 3 to SQL-query is shown as Figure 6. It is easy to see on TC decision tree shown the amount of possible TC Solutions (TCS). It is important to underline that output produced by SQLquery with different TCS might be different. In such situations a critical question arises: What is a criteria of selection of theTCS, which provides the right output. IDAS activities are based on the hypothesis that the right output might be produced by SQL-query with the best TCS, where the definition of the best TCS is obvious. Let us call TCS the best if for each pair of tables the shortest link was used. The given definition follows the principle of simplicity described earlier. Red lines on Figure 6 indicate TCS. TC decision tree was created using the breadth-first method. Unipath heuristics rule has been involved for selection of the best TCS. Two different type of fields are used as foreign keys to provide TCS:

Primary keys e.g. ACC.ACC_ID = RB_DAILY_STATS.ACC_ID i.e. primary key ACC_ID from table ACC had been placed into table RB_DAILY_STATS as a foreign key.

XII-th International Conference "Knowledge - Dialogue - Solution" Value fields. Sometimes for different reasons, the DB has data redundancy i.e. in different tables there are fields with the same data (data duplication). The names of such fields are not necessarily the same. In that case at the stage of KB creation such field names should be described as synonyms. In the considered example, fields SVC_INBOUND_NUMBER from table SVC_NUM_V and SHORT_CODE from table SHORT_CODE are synonyms. Figure 6 has a double red line which shows the links between them.

Figure 6. TC Decision Tree and TC Solution Conclusion IDAS effectively allows us to place information directly into the hands of business users - eliminating the need for technical support specialists continually to address ad hoc requests from end users. To do it properly all four types of enquiries should be provided. IDAS shields the user from the complexity of the underlying technology and itself acts as an intelligent user assistant.

Bibliography [1] V.A.Lovitskii and K.Wittamore, "DANIL: Databases Access using a Natural Interface Language", Proc. of the International Joint Conference on Knowledge-Dialogue-Solution: KDS-97, Yalta (Ukraine), 282-288, 1997.

Pages:     | 1 |   ...   | 72 | 73 || 75 | 76 |   ...   | 82 |

2011 www.dissers.ru -

, .
, , , , 1-2 .