Campus Montegancedo. 28660 Boadilla del Monte (Madrid). Spain; e-mail: ctorres@fi.upm.es Elena Castineira – Dept. Applied Mathematic. Computer Science School of University Politcnica of Madrid.

Campus Montegancedo. 28660 Boadilla del Monte (Madrid). Spain; e-mail: ecastineira@fi.upm.es Susana Cubillo – Dept. Applied Mathematic. Computer Science School of University Politcnica of Madrid.

Campus Montegancedo. 28660 Boadilla del Monte (Madrid). Spain; e-mail: scubillo@fi.upm.es RECURSIVE MATRICES FOR AN INFORMATION RETRIEVAL PROBLEM Adriana Toni, Juan Castellanos, Jose Joaquin Erviti Abstract Let v..v be variables storing values from an arbitrary commutative semigroup S. We are interested in 1 n the study and design of data structures for implementing the following operations. The operation update(i,x) increments the value of v by x, and the operation retrieve(i,j) returns v +…+v Our interest centers upon i i j.

improving the average complexity of the operations. We define matrices representing a solution for the problem inside a matricial model of computation. We achieve a constant average complexity for the set of update and retrieve operations.

Keywords: data structures, models of computation, analysis of algorithms and problem complexity Introduction Let v..v be variables storing values from an arbitrary commutative semigroup S. We desire to execute the 1 n following operations on these variables:

a) retrieve(i,j) returns vi +…+vj 1 i j n, b) update(i,x): vi := vi + x 1 i n, x S This problem is known as the range query problem of size n.

We can organize the variables as an array V of length n, and implement the operations as above. In this case, the complexity of executing an update operation is constant meanwhile the worst case complexity of a retrieve is linear on n.

Our interest centers upon improving the average complexity of the operations assuming that each one of them is selected with the same probability.

We can use different data structures involving a different number of variables storing values in the semigroup, and provide the corresponding algorithms to implement the update and retrieve operations, and still be solving the same computational problem.

A matricial model for the study of the range query problem has been defined, relative to which computational complexity is assesed (see [6]) Information Models The model comprises all programs verifying:

a) A set of variables Z={z z..z } is maintained.

1, 2 m b) Retrieve(i,j) is performed by adding up a subset of these variables.

c) Update(j,x) is performed by incrementing a subset of these variables by amounts which depend linearly on x.

The model defined in [6] consists of triples where Z = {z1…zm} is a set of variables storing values on an arbitrary semigroup S, R=(ri,j) is a zero-one matrix of n(n +1) dimension m and U=(ui,j) is a zero-one matrix of dimension m n. Each row of R describes the subset of variables of Z which have to be added to execute one of the retrieve operations, and the i-th column of U describes the subset of such variables which have to incremented to execute an update(i,x). So, a pair of R and U matrices describes a solution for the range query problem of size n (m, the number of required program variables, may change although if has to be greater or equal n). Associated with a triple , the programs implementing the operations are defined as follows.

Definition Given a triplet within the matrix model for the range query problem of size n, with Z = {z …z }, then the 1 m update and retrieve operations must be implemented through the following programs:

• update(j,x): for l:=1 to m do [z z + u x] 1 l l,j m i-• Retrieve(i,j): output rk,l zl, where k = (n - s) + ( j - i +1) l=1 s=The following proposition establishes a condition on R, U that entails reworking the programs defined above.

Proposition n(n +1) Let H be the matrix of dimensions n defined by:

1 l j l + (i - wl-1 -1) H = ij otherwise where k -w = - l), k=0…n k (n l= i {(w +1)…w}, l=1…n l-1 l Then the programs given in Definition 1 represent a solution for the range query problem of size n if and only if R U = H.

Remark Note that if Tn is the triangular matrix of dimensions n n consisting of 0s above the main diagonal and 1s along and below the main diagonal, 1 i j T=(t ) with t = ij i,j=1..n ij 0 i j then the following statement holds for all k = 0…(n-1), n-k H[w +1..wk ],[k +1..n] = T k +H[w +1..wk ],[1..k ] = k +Fourth International Conference I.TECH 2006 i-where wi = (n - s) for i = 0…n, and H is the box of matrix H composed of the rows in the interval [i…j],[r…s] s=[i…j] and the columns in the interval [r…s].

Hence, for any problem size n, the corresponding matrix Hn can be expressed as a function of the matrix T of different dimensions as:

n T n- T H = n-(n-1) T where 0 … … n-i n-i T = T i=1..(n-1) 0 … … Example The matrix H for the range query problem of sizes n=2 and n=4, denoted H2 and H4, respectively, is shown below.

Lines have been added to highlight the logical division into boxes 1 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 2 H = 1 1, H = 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 In the following we define the complexity associated with the operations within the matrix model.

Definition Given a triplet that solves the range query problem of size n within the matrix model, with Z = (z …z ), 1 m R = (ri, j ),U = (ui, j )i=1...m, j=1...n, let the complexity associated with the Retrieve (i, j) operation n(n+1) i=1..., j=1...m be defined as:

i-| {rkl / rkl 0 (1 l m)}| where k = ( j - i +1) + - s) (n s=and the complexity associated with the Update (j, x) operation be defined as:

|{ulj / ulj 0 (1 l m)}| Information Models Let m be the number of columns of R and of rows of U. The average complexity of Update operations is given by m n uij i=1 j=p = n and the average complexity of the Retrieve operations by n(n+1) / m rij i=1 j=t = n(n +1) It is known for any data structure solving the range query problem of size n that + t = (logn) Range Query Problem-Solving Matrices In the following we will define pairs of matrices of 0s and 1s whose product is the matrix H —that is, matrices that represent solutions to the range query problem – in an attempt to minimize the total number of 1s present in both matrices and, therefore, the average complexity of the operations. A recursive definition will be given later (see Definition 13). In particular, let the matrices R = (r, j), U = (u, j) be defined, whose product for any dimension n of i i the form 2k with kN+ is Hn.

Remember that the average complexity is calculated by dividing the total number of 1s by the number of different operations. Hence, if we are dealing with the problem of size n and let n(n+1) / 2 m m n (n) = + rij uij i=1 j=1 i=1 j=where m is the number of variables z …z used to implement the solution (m may vary, although it necessarily 1 m (n) has to be greater than or equal to n), then the average complexity is given by (n is the number n(n +1) n + n(n +1) of different Update(j, x) operations as a function of the first argument and is the number of different possible arguments for a Retrieve(i, j) operation).

We will prove that our matrices hold for 3 3 (n) = n2 - nlog2 n + n - 2log2 n - 2 2 and this implies an average complexity that has a constant order of complexity.

Remember that a superindex is used to specify the size of the problem corresponding to the matrix H. Hence Hn n(n +1) denotes the matrix for the range query problem of size n, although the size of Hn is n.

Fourth International Conference I.TECH 2006 Concepts and Notation Let Iim denote the matrix resulting from permuting the ith and jth rows in the identity matrix of dimensions m m,, j denoted Im. For any matrix M of dimensions m n, Iim M returns the matrix M in which rows i, j have, j switched position.

m m Generally, if I = Iim Iim... Iim, the effect of the multiplication I M is to switch the position of, j1 j2, jk 1 2 k and j, then with rows i -2 k - and j … until finally rows i, rows i and j of M, then do the same thing with rows i -1 k -1 k k k k j have been switched.

For any n = 2k, any row rearrangement of matrix Hn associated with the range query problem of size n can be n +achieved by multiplying Hn by a certain identity transform I, where Permutations and + Permutations(k) = { f :{1...k} {1...k}/ f one to one-onto, k N }. This is due to a known algebraic result, stating that any permutation that is a member of Permutations(k) can be expressed as a composition of a certain number of permutations of that set whereby all the elements of {1…k} save two are held fixed.

The matrix Hn should ultimately be rearranged as the matrix Sn, which is defined below.

Definition 6 For all n of the form 2l with lN+, let n H n n M12 T n n 2 M T Sn = n n 2 M T n n 0 H m where for any mN+, k=1…m, the matrices M are square matrices of dimensions m defined as k 1 k j m (M )i, j = 0 otherwise k Hereinafter let IH S denote the matrix I that leads to the transformation of Hn into Sn.

n n Hence, IH S x Hn = Sn.

n n Let us look at a couple of simple examples from which the specific expression of the matrix IH S can be n n easily deduced.

1 2 Example 7 If n=2, then H = 2 1 1 = S2 and hence I H S = I 0 Information Models Let us also look at the example of the transformation of H4 into S4, illustrating the operations that need to be performed and the expression corresponding to the matrix IH S.

4 Here 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 0 1 1 1 4 3,5 4, H = I I = S 0 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 Hence, IH S = I = I4,5 I3,4 Where :{1..10} {1..10}, (3) = 4, (4) = 5, (5) = Remark How does this type of transformations affect the matrix approach to the range query problem We know that it involves studying pairs of integer matrices Rn, Un such that Rn Un = Hn. But if Rn Un = Hn, then IH S x Rn n n Un = Sn. Hence the problem can be reformulated equivalently as entailing the study of matrix pairs whose product is the matrix Sn. In this case the algorithm that implements the Retrieve operations given in Definition 1 has to be modified, and the definition of the programs associated with a triplet is now as follows:

Definition Given a triplet , with Z = (z …z ), R, U matrices of dimensions n m and m n, respectively, let us 1 m define the following algorithms to implement the Update and Retrieve operations:

1. Update(j,x): for l: 1 to m do zl zl + ul, j x m 2. Retrieve(I,j): output rk,l zl where k is given by:

l=i-2 n n a) k = - s + ( j - i +1), 1 i j s=2 i-2 n b) k = (n - s)+ ( j - i +1), < i j n s=n n + n n 2 2 n n c) k = (i -1) + j - +, i, j 2 2 2 2 The intuitive idea is that the row number (k) of the matrix Rn associated with a Retrieve(i,j) operation is different now, as some rows have switched position.

The change of approach has no bearing on the complexity study of the operations, as, remember, the effect of multiplying any matrix by a certain I does not alter the number of non-null matrix elements, but only switches the position of certain rows.

Fourth International Conference I.TECH 2006 Recursive Definition of Our Problem-Solving Matrices In this section we will give a recursive definition of our matrix pairs Rn, Un as a function of the problem size n.

The matrices hold for Rn Un = Hn.

As mentioned already, the definition is valid for values of the form n = 2k.

Let us refer to blocks of consecutive rows of the matrix Rn, which we consider to be divided into n horizontal blocks, the first formed by the first n rows, the second by the next (n-1) rows, the third by the (n-2) rows… up to the (n-1)th block, which is composed of two rows and the nth block which consists of just the last row. Let Rin denote the ith block of Rn and Rin ( j), the jth row of this block such that the matrix Rn is given by R1n n R Rn = n Rn n(n +1) (note that if the dimensions of Rn are m, then the dimensions of each block Rin are (n-i+1) m).

Rn, Un pairs are constructed by applying a function called Refinement. This function can be viewed as a two-stage process: the first stage involves executing a sequence of extension steps, and the second rearranging the rows of Rn by multiplying by a given identity transform matrix I.

The following Definition 11 and Lemma 11 are needed to define the extension step concept.

Definition Given a matrix M of 0s and 1s, two columns i, j are said to be disjoint if the set of rows {k/m, =1=m } is empty.

k i k,j Similarly, two rows i, j of M are said to be disjoint if the set of columns {k/m, =1=m } is empty.

i k j,k Lemma n(n +1) Let A, B be two matrices of 0s and 1s such that A B = Sn, of dimensions m and m n, respectively. Assume that there are two columns i, j of A that are not disjoint and let {l …l } be the set of rows of A 1 q for which, k=1…q holds. Then the rows i, j of B are disjoint.

al,i = 1 = al, j k k Definition n(n +1) Let A, B be two matrices of 0s and 1s such that A B = Sn, of dimensions m and m n, respectively.

Assume that there is a set of columns C = {c …c}, l 2, for which there is a non-empty maximal set —including 1 l all the rows that meet the following condition— of rows F = {f,…, f } such that a = 1 cj C, 1 q fi,c j fi F We define the extension step associated with the sets C, F as the execution of the following actions on A and B:

Information Models n(n +1) 1. Insert a new column z in A such that ai, z0 = 1 i F i = 1..

2. Add a new row z in B such that bz, j = {bk, j / k C} j = 1..m 3. Columns c …c of A are modified as follows af,ck := 0 ck C, fi F 1 l i It can be easily demonstrated that if an extension step is applied to a matrix pair whose product is the matrix Sn, the product of the resulting matrices is the very same matrix Sn.

We are now able to give a recursive definition of our matrix pairs.

Материалы этого сайта размещены для ознакомления, все права принадлежат их авторам.
Если Вы не согласны с тем, что Ваш материал размещён на этом сайте, пожалуйста, напишите нам, мы в течении 1-2 рабочих дней удалим его.