Understanding Joins

JOINS

Joins in data model should be used wisely otherwise there may be possibility of things going disastrously wrong. However, it is advisable to keep minimum branches while designing schema.

QlikView uses for types of Joins, theoretically it is same as SQL Joins, but there is slight difference in syntax. SQL uses INNER, LEFT, RIGHT and FULL whereas QlikView uses INNER, OUTER, LEFT, JOIN. It’s better to use proper Join wherever necessary.

The easiest way to flatten the data model is to join two tables together into one. For a table, where there is a key field and a description, it is very easy to add the description into the main table using a LEFT join (refer to the following section for an example.) Data is joined using all the matching fields in the Load statement, so care must be taken to ensure that all fields needed are included in the Load statement and that any field that is not needed as a link is aliased. A further consideration is that if there are any duplicate records in the joining table, this will create duplicates in the receiving table too. One way to monitor this is to add the following two lines of code to the script, both before and after the join statement. This way, the log file will show if the main table has an increased number of rows when it shouldn’t:

let varNoRows=NoOfRows('Product');
TRACE $(varNoRows);
 

Here, 'Product' is replaced with the name of the table being joined. Care must also be taken to avoid a cartesian join as the resulting table could stop the script from running or cause it to take many hours to complete. If this happens, try refreshing using the debug option with the rows limited to 10 or 100. This way, you will be able to see if there is a large increase in rows, especially if you added the preceding two lines of script.

Cartesian Join

A cartesian join is a join of every row of one table to every row of another table. This normally happens when no matching join fields are specified. For example, if table A with 250 rows is joined with table B with 2000 rows, a cartesian join will return 500,000 rows. 

Here is an example:

Product:
LOAD [Product Code],
[Supplier Product Code],
[Supplier No],
[Manufacturer Product Code],
[Manufacturer No],
[Short Desciption],
Kit,
Price
FROM
[QlikView Unlocked Data Framework.xlsx]
(
ooxml, embedded labels, table is Product);

LET varNoRows=NoOfRows('Product');
TRACE $(varNoRows);

Left Join(Product)

LOAD [Manufacturer No],
[Manufacturer Name]
FROM
[QlikView Unlocked Data Framework.xlsx]
(
ooxml, embedded labels, table is Manufacturer);

DROP Field [Manufacturer No];

LET varNoRows=NoOfRows('Product');

TRACE $(varNoRows);


In this example script, there is a Product table which has a Manufacturer No field but no Manufacturer Name as this is held in a separate table. To make the data model more efficient, we can add Manufacturer Name to our Product table using a Left join. Only records that are in the Manufacturer table with a matching Manufacturer No to that of the Product table will be added. If there are any Manufacturer No fields that are not in the Manufacturer table, Manufacturer Name will contain a null. As in the example, once the join is complete, we can remove the Manufacturer No field (unless we want to report on this as well) as most of the time, the linking field is just a pointer. If you are planning a Left join with one or two fields with a single joining key, you may want to consider using a mapping table rather than a join. This can be more efficient and offer you the option of setting the field value if there is no join field. The preceding example could be re-scripted to the following:

ManufName:
Mapping
LOAD [Manufacturer No],
[Manufacturer Name]
FROM
[QlikView Unlocked Data Framework.xlsx]
(
ooxml, embedded labels, table is Manufacturer);

Product:
LOAD [Product Code],
[Supplier Product Code],
[Supplier No],
[Manufacturer Product Code],
ApplyMap('ManufName',[Manufacturer No],'Missing')
as [Manufacturer Name],
[Short Description],
Kit,
Price
FROM
[QlikView Unlocked Data Framework.xlsx]
(
ooxml, embedded labels, table is
Product);
Here, we will first create a Mapping table to hold what was the Left join in our previous example. Then, we will load the main data from the Product table. We will use the ApplyMap() function to get the Manufacturer Name field and supply a piece of text when there is nothing to map. As we do not need to store the Manufacturer No field, we do not need the Drop field statement either. Depending on your server and data sources, this method can be more efficient than a Left join, so it is worth considering.


The following table gives a summary of join types:

Join Type
Usage
Left
Left Join (Table_A) * Resident Table_B Fields from Table_B will be added to Table_A where there is a matching record in Table_A and Table_B.
Right
Right Join (Table_A) * Resident Table_B Table_A will now only contain records that exist in Table_B with the fields added from Table_B.
Inner
Inner Join (Table_A) * Resident Table_B Table_A will only contain records that are present in Table_B
Outer
Outer Join (Table_A) * Resident Table_B Table_A will contain all records in Table_A or Table_B.

When things go wrong with a join, they tend to go very wrong!


If you have a cartesian join when you are not expecting it, your system could grind to a halt. The easiest way to resolve this, should it happen, is to refresh in Debug mode, with the Limited Load option ticked and set to 10. Then, look in the Table View screen and let the mouse hover over a table to ensure that the number of rows is correct. 


Comments

  1. I like your post very much. It is very much useful for my research. I hope you to share more info about this. Keep posting Qlikview Online Training

    ReplyDelete

Post a Comment

Popular posts from this blog

Qlik Sense Data Reduction (Dynamic)

What is Synthetic Key?