Consultor Eletrônico



Kbase 14110: Client/Server Performance of Multiple Table (Joined) Queries
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   5/10/1998
Client/Server Performance of Multiple Table (Joined) Queries

SUBJECT: Multi-table (joined) Query Performance
DATE : June 1995
AUTHOR : Tom Nordhougen


The manner in which any query (OPEN QUERY, FOR EACH, SELECT)
involving multiple "joined" tables is resolved in a
client/server environment is as follows (NOTE: this
functionality may be changed in a future release, but as of
version 8 this behavior still holds true):

Take for example the following query:

OPEN QUERY FOR EACH customer WHERE customer.state = "CA",
EACH order WHERE order.custnum = customer.custnum
AND order.orderdate = 01/01/90,
EACH orderline WHERE orderline.ordernum = order.ordernum
AND orderline.itemname = "PLUTONIUM"
BY customer.custnum.

When this query is executed, it is broken up into three
separate queries (internally - this is transparent to the
user and the programmer). The join is NOT done on the
server; each query is sent to the server separately,
starting with the first table in the statement.

Let's assume that in the above example, there are 5,000
customer records whose state field is equal to "CA", and
let's further assume that there is an index on this field.
Let's also assume that there is a one-to-one correspondence
between "customer" and "order" in this case, so that there
would be one order record for each customer, and again that
there is an index defined for the order table with both the
"custnum" and "orderdate" field. For the orderline table,
let's assume that there is an index defined for only the
"ordernum" field, and that there are 1,000 records having an
"ordernum" field that matches a record in the order table,
and that only 2 of those orderline records contain the
value "PLUTONIUM" in the "itemname" field.

So, the result of this query is that only six records are
returned. If the query join was performed on the server, we
would only see those six records sent over the network, and
the sorting in this case would be performed on the client
after the records were returned. But this is not the case.
Here is what would actually happen, along with a strategy
for improving the response time:

1. The client would break the query up into three queries,
and send the first query to the server to evaluate.

2. The server would return the first record from the query.
The client would then send the second query request
(using field(s) from the first returned record) to the
server. The server would return the first record from the
second query.

3. The client would then send the third query request to
the server, and the server would either return the third
record, or it would send a message indicating that no
record could be found to meet the selection criteria, in
which case the client would throw away both records from the
first two tables and repeat step 1.

As you can see, the way the query is structured with
respect to the amount of records meeting each section of the
query, we will be sending 5,000 records over the network
from the first query, and another 5,000 records from the
second query, and finally only two records from the third,
having to throw away 9,996 of those records! If the query
were restructured (in this case, inverted) we could cause
fewer records to be sent over the network, and we could
greatly improve the response time (assuming that there are
only a few orderline records having an itemname of
"PLUTONIUM"):

OPEN QUERY FOR EACH orderline.itemname WHERE
orderline.itemname = "PLUTONIUM",
EACH order WHERE order.ordernum = orderline.ordernum
AND order.orderdate = 01/01/90,
EACH customer WHERE customer.custnum = order.custnum
AND customer.state = "CA"
BY customer.custnum.

We must also assume that, in this case, we have the
appropriate indexes set up to ensure that we limit the
amount of records we have to search through, but the point
is that it is not always simply utilizing indexes that
makes a joined query fast and efficient, it is also the
number of records returned for each table. It is best to
put as the first table in the query the one that will
return the fewest records; this will dictate how many
searches we will have to do on the other tables.


Progress Software Technical Support Note # 14110