If you have spent some time working with kdb+, there is no doubt that you have had to use some sort of join in your query. Joins are essential to data analyzing. kdb+ query language is called qsql which, as you probably guessed by name, is based on popular query language, SQL. I don’t have much experience with SQL so I can’t tell you about how joins work in SQL but I can discuss joins in qsql.
There are different types of joins offered in qsql…some of them are very similar so make sure to pay attention to details. I will try my best to provide some good examples to make the difference clear. The available joins are: left join, plus join, inner join, equi-join, union join, asof join and window join.
For all my examples below, I will be using these two tables:
q)t1:([]sym:`IBM`MSFT`AAPL;price:3?100) q)t2:([]sym:`AAPL`IBM;price:2?100;size:2?500) q)t1 sym price ---------- IBM 12 MSFT 10 AAPL 1 q)t2 sym price size --------------- AAPL 90 346 IBM 43 73
Left Join (lj)
This is probably the most popular join out there and very simple to understand too. The syntax is:
q)t1 lj 1!t2
Note that t2
must be keyed or else you will get a `mismatch
error.
Left
join uses t2
as a lookup dictionary for each row in t1
. So, the final table will have all the rows of t1
but with updated values from t2
.
q)t1 lj 1!t2 sym price size --------------- IBM 43 73 MSFT 10 AAPL 90 346
As you can see, our resultant table has updated values (including a new column, size) from t2
for each sym in t1
. If it can’t find a value for a sym, it will simply return null
…like it did for `MSFT
.
Inner Join (ij)
Inner
join is very similar to left
join. Inner
join only returns data for syms that are available in the lookup table (t2
). So, in our case, we should get back pretty much the same table like we did with lj
except for the `MSFT
row since it’s not there in t2
.
Just like lj
, t2
needs to be keyed for ij
as well.
q)t1 ij 1!t2 sym price size --------------- IBM 43 73 AAPL 90 346
Plus Join (pj)
A plus
join does the same job as an lj
…except that for updating the values from the lookup table, it ADDS them.
Just like lj
, t2
needs to be keyed for pj
as well.
q)t1 pj 1!t2 sym price size --------------- IBM 55 73 MSFT 10 0 AAPL 91 346
Equi-Join (ej)
Equi-join
is same as inner
join but it lets you specify the columns you want to apply it on in the syntax. For example,ej[`sym;t1;t2]
is same as t1 ij 1!t2
q)ej[`sym;t1;t2]
sym price size
---------------
IBM 43 73
These suppliers are well-known for delivering world-wide shipping, on-time delivery, great prices, http://www.icks.org/html/04_publication.php?cate=FALL%2FWINTER+2008 cialis 5 mg and other lucrative benefits on purchase. They viagra online in canada can intake Eriacta pills to enhance ejaculation ability with moderate use of the medicine as directed by the physician. Below is a list of all the spam emails that you receive there are plenty buy cialis have a peek at these guys of potential clients; however there are more useless leads. buy generic viagra Some top bloggers write one long post a week and two smaller posts. AAPL 90 346
The first argument of ej
is where you can specify the columns you want to apply the join on.
Union Join (uj)
Union
join is a very generic join. If both tables, t1
and t2
, are unkeyed, it will simply append records from t2
to t1
. If both tables are keyed, it will update the values of t1
from t2
(just like left
join). If only one table is keyed, you will get an error.
q)t1 uj t2 sym price size --------------- IBM 12 MSFT 10 AAPL 1 AAPL 90 346 IBM 43 73 q)(1!t1)uj 1!t2 sym | price size ----| ---------- IBM | 43 73 MSFT| 10 AAPL| 90 346
For the asof
join and window
join, we will use these tables as examples:
q)t1:([]time:3?.z.t;sym:`AAPL`IBM`MSFT;price:3?100) q)t2:([]time:3?.z.t;sym:`AAPL`MSFT`AAPL;price:3?100) q)t1 time sym price ----------------------- 00:45:40.134 AAPL 63 05:12:49.761 IBM 93 04:54:11.685 MSFT 54 q)t2 time sym price ----------------------- 05:47:49.777 AAPL 88 02:50:04.026 MSFT 77 00:34:28.887 AAPL 30
Asof Join (aj)
Asof
join is a join that is primarily meant to join tables along the time column. In a nutshell, it will take data from one table and find the last values (i.e. give me the price of `AAPL
as of 3pm).
The syntax is similar to ej
syntax. The common columns specified in aj
must be of the same type and the last column specified in the syntax (`time
in this case) must be present in both tables.
Let’s anticipate what will happen if we did an aj
on t1
and t2
. The join will first get the time values from t1
and as of those times, will try to find the last value for each sym. So, let’s look at 'AAPL
. Our join will first take the time value of 00:45:40.134
and look for all 'AAPL
updates before that time and get us the last one. In this case, the last update occurs at 00:34:28.887
with a price of 30
. For 'IBM
, we don’t have a matching entry in t2
so there are no updates to consider. For `MSFT
, the last update occurs at 02:50:04.026
with price of 77
as of 04:54:11.685
. Let’s do the join and see the result.
q)aj[`sym`time;t1;t2] time sym price ----------------------- 00:45:40.134 AAPL 30 05:12:49.761 IBM 93 04:54:11.685 MSFT 77
Boom! Nailed it!
Window Join (wj)
Window
join is a generic version of aj
because it can handle aggregations as well. Instead of just getting the last value, it will do an aggregation (you specify what type) on the rows that fall into the time window.
Window
join can be very helpful in analytics. You can use it for doing transaction cost analysis by joining trade and quote data to see whether the trade you just made was above or below market price.
Syntax: wj[w;c;t;a]
where w
is for time windows, c
is for columns, t
is for table and a
is for aggregation functions.
Sample call (taken from kx reference page):
wj[w;`sym`time;trade;(quote;(max;`ask);(min;`bid))]
Make sure you familiarize yourself well with all these joins because most of them are commonly used in data analytics. If you have experience with SQL then it shouldn’t be too difficult as there is a lot of overlapping.