Understanding Joins
JOINS
Joins in data model should be
used wisely otherwise there may be possibility of things going disastrously wrong.
However, it is advisable to keep minimum branches while designing schema.
QlikView uses for types of Joins,
theoretically it is same as SQL Joins, but there is slight difference in syntax.
SQL uses INNER, LEFT, RIGHT and FULL whereas QlikView uses INNER, OUTER, LEFT,
JOIN. It’s better to use proper Join wherever necessary.
The easiest way to flatten the
data model is to join two tables together into one. For a table, where there is
a key field and a description, it is very easy to add the description into the
main table using a LEFT join (refer to the following section for an example.)
Data is joined using all the matching fields in the Load statement, so care
must be taken to ensure that all fields needed are included in the Load
statement and that any field that is not needed as a link is aliased. A further
consideration is that if there are any duplicate records in the joining table,
this will create duplicates in the receiving table too. One way to monitor this
is to add the following two lines of code to the script, both before and after
the join statement. This way, the log file will show if the main table has an
increased number of rows when it shouldn’t:
let varNoRows=NoOfRows('Product');
TRACE $(varNoRows);
TRACE $(varNoRows);
Here, 'Product' is replaced with the name of the table being
joined. Care must also be taken to avoid a cartesian join as the resulting
table could stop the script from running or cause it to take many hours to
complete. If this happens, try refreshing using the debug option with the rows
limited to 10 or 100. This way, you will be able to see if there is a large
increase in rows, especially if you added the preceding two lines of script.
Cartesian Join
A cartesian join is a join of every row of one table to
every row of another table. This normally happens when no matching join fields
are specified. For example, if table A with 250 rows is joined with table B
with 2000 rows, a cartesian join will return 500,000 rows.
Here is an example:
Product:
LOAD [Product Code],
[Supplier Product Code],
[Supplier No],
[Manufacturer Product Code],
[Manufacturer No],
[Short Desciption],
Kit,
Price
FROM
[QlikView Unlocked Data Framework.xlsx]
(ooxml, embedded labels, table is Product);
LOAD [Product Code],
[Supplier Product Code],
[Supplier No],
[Manufacturer Product Code],
[Manufacturer No],
[Short Desciption],
Kit,
Price
FROM
[QlikView Unlocked Data Framework.xlsx]
(ooxml, embedded labels, table is Product);
LET varNoRows=NoOfRows('Product');
TRACE $(varNoRows);
Left Join(Product)
LOAD [Manufacturer
No],
[Manufacturer Name]
FROM
[QlikView Unlocked Data Framework.xlsx]
(ooxml, embedded labels, table is Manufacturer);
[Manufacturer Name]
FROM
[QlikView Unlocked Data Framework.xlsx]
(ooxml, embedded labels, table is Manufacturer);
DROP Field [Manufacturer
No];
LET varNoRows=NoOfRows('Product');
TRACE $(varNoRows);
In this example script, there is
a Product table which has a Manufacturer No field but no Manufacturer Name as
this is held in a separate table. To make the data model more efficient, we can
add Manufacturer Name to our Product table using a Left join. Only records that
are in the Manufacturer table with a matching Manufacturer No to that of the
Product table will be added. If there are any Manufacturer No fields that are
not in the Manufacturer table, Manufacturer Name will contain a null. As in the
example, once the join is complete, we can remove the Manufacturer No field
(unless we want to report on this as well) as most of the time, the linking
field is just a pointer. If you are planning a Left join with one or two fields
with a single joining key, you may want to consider using a mapping table
rather than a join. This can be more efficient and offer you the option of
setting the field value if there is no join field. The preceding example could
be re-scripted to the following:
ManufName:
Mapping
LOAD [Manufacturer No],
[Manufacturer Name]
FROM
[QlikView Unlocked Data Framework.xlsx]
(ooxml, embedded labels, table is Manufacturer);
Product:
LOAD [Product Code],
[Supplier Product Code],
[Supplier No],
[Manufacturer Product Code],
ApplyMap('ManufName',[Manufacturer No],'Missing')
as [Manufacturer Name],
[Short Description],
Kit,
Price
FROM
[QlikView Unlocked Data Framework.xlsx]
(ooxml, embedded labels, table is Product);
Mapping
LOAD [Manufacturer No],
[Manufacturer Name]
FROM
[QlikView Unlocked Data Framework.xlsx]
(ooxml, embedded labels, table is Manufacturer);
Product:
LOAD [Product Code],
[Supplier Product Code],
[Supplier No],
[Manufacturer Product Code],
ApplyMap('ManufName',[Manufacturer No],'Missing')
as [Manufacturer Name],
[Short Description],
Kit,
Price
FROM
[QlikView Unlocked Data Framework.xlsx]
(ooxml, embedded labels, table is Product);
Here, we will first create a
Mapping table to hold what was the Left join in our previous example. Then, we
will load the main data from the Product table. We will use the ApplyMap()
function to get the Manufacturer Name field and supply a piece of text when
there is nothing to map. As we do not need to store the Manufacturer No field,
we do not need the Drop field statement either. Depending on your server and
data sources, this method can be more efficient than a Left join, so it is
worth considering.
The following table gives a
summary of join types:
Join Type
|
Usage
|
Left
|
Left Join (Table_A) * Resident Table_B Fields
from Table_B will be added to Table_A where there is a matching record in
Table_A and Table_B.
|
Right
|
Right Join (Table_A) * Resident Table_B Table_A
will now only contain records that exist in Table_B with the fields added
from Table_B.
|
Inner
|
Inner Join (Table_A) * Resident Table_B Table_A
will only contain records that are present in Table_B
|
Outer
|
Outer Join (Table_A) * Resident Table_B Table_A
will contain all records in Table_A or Table_B.
|
When things go wrong with a join,
they tend to go very wrong!
If you have a cartesian join when
you are not expecting it, your system could grind to a halt. The easiest way to
resolve this, should it happen, is to refresh in Debug mode, with the Limited
Load option ticked and set to 10. Then, look in the Table View screen and let
the mouse hover over a table to ensure that the number of rows is correct.
I like your post very much. It is very much useful for my research. I hope you to share more info about this. Keep posting Qlikview Online Training
ReplyDelete