Note: these pages are no longer maintainedNever the less, much of the information is still relevant.Beware, however, that some of the command syntax is from older versions, and thus may no longer work as expected. Also: external links, from external sources, inside these pages may no longer function. |
Recipe #2: |
|
2011 January 28 |
Previous Slide | Table of Contents | Next Slide |
You already know the basic foundations about simple SQL queries.
Any previous example encountered since now simply queried a single table: anyway SQL has no imposed limits, so you can query an arbitrary number of tables at the same time. But in order to do this you must understand how to correctly handle a JOIN. |
SELECT * FROM counties, regions; |
county_id | county_name | car_plate_code | region_id | region_id | region_name |
1 | TORINO | TO | 1 | 1 | PIEMONTE |
1 | TORINO | TO | 1 | 2 | VALLE D'AOSTA |
1 | TORINO | TO | 1 | 3 | LOMBARDIA |
1 | TORINO | TO | 1 | 4 | TRENTINO-ALTO ADIGE |
1 | TORINO | TO | 1 | 5 | VENETO |
... | ... | ... | ... | ... | ... |
Every time SQL queries two different tables at the same time, the Cartesian Product of both
datasets is calculated.
i.e. each row coming from the first dataset is JOINed with any possible row coming from the second dataset. This one is a blind combinatorial process, so it very difficultly can produce useful results. And this process can easily generate a really huge result-set: this must absolutely be avoided, because:
|
SELECT * FROM counties, regions WHERE counties.region_id = regions.region_id; |
SELECT c.county_id AS county_id, c.county_name AS county_name, c.car_plate_code AS car_plate_code, r.region_id AS region_id, r.region_name AS region_name FROM counties AS c, regions AS r WHERE c.region_id = r.region_id; |
county_id | county_name | car_plate_code | region_id | region_name |
1 | TORINO | TO | 1 | PIEMONTE |
2 | VERCELLI | VC | 1 | PIEMONTE |
3 | NOVARA | NO | 1 | PIEMONTE |
4 | CUNEO | CN | 1 | PIEMONTE |
5 | ASTI | AT | 1 | PIEMONTE |
6 | ALESSANDRIA | AL | 1 | PIEMONTE |
... | ... | ... | ... | ... |
SELECT lc.lc_id AS lc_id, lc.lc_name AS lc_name, lc.population AS population, c.county_id AS county_id, c.county_name AS county_name, c.car_plate_code AS car_plate_code, r.region_id AS region_id, r.region_name AS region_name FROM local_councils AS lc, counties AS c, regions AS r WHERE lc.county_id = c.county_id AND c.region_id = r.region_id; |
lc_id | lc_name | population | county_id | county_name | car_plate_code | region_id | region_name |
1001 | AGLIE' | 2574 | 1 | TORINO | TO | 1 | PIEMONTE |
1002 | AIRASCA | 3554 | 1 | TORINO | TO | 1 | PIEMONTE |
1003 | ALA DI STURA | 479 | 1 | TORINO | TO | 1 | PIEMONTE |
... | ... | ... | ... | ... | ... | ... | ... |
Performance considerations Executing complex queries involving many different tables may easily run in a very slow and sluggish mode.This will most easily noticed when such tables contain a huge number of rows. Explaining all this isn't at all difficult: in order to calculate the Cartesian Product the SQL engine has to access many and many times each table involved in the query. The basic behavior is the one to perform a full table scan each time: and obviously scanning a long table many and many times requires a long time. So the main key-point in order optimize your queries is the one to avoid using full table scans as much as possible. All this is fully supported, and it's easy to be implemented. Each time the SQL-planner (an internal component of the SQL-engine) detects that an appropriate INDEX is available, there is no need at all to perform full table scans, because each single row can now be immediately accessed using this Index. And this one will obviously be a much faster process. Any column (or group of columns) frequently used in JOIN clauses is a good candidate for a corresponding INDEX. Anyway, creating an Index implies several negative consequences:
i.e a compromise between contrasting requirements, under various conditions and in different users-cases. In other words there is no absolute rule: you must find your optimal case-by-case solution performing several practical tests, until you get the optimal solution fulfilling your requirements. |
Previous Slide | Table of Contents | Next Slide |
Author: Alessandro Furieri a.furieri@lqt.it | |
This work is licensed under the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license. | |
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. |