Below example would show how to findout gaps and fill the data from previous day values.
Lets consider two tables, Stocks (Dimension) and Stock_PRC (Fact Table)
create table stocks(
id number,
ticker varchar2(10)
,company_name varchar2(100)
,descr varchar2(200)
);
create table stocks_prc(
prc_id number
,stocks_id number
,price number
,business_dt date
);
stocks_id has a forign key reference to id column of stocks table.
Let fill some data.
create sequence stockssq;
insert into stocks values (stockssq.nextval,'INFY','Infosys India Ltd', 'regdInfosys India Ltd.Regd');
insert into stocks values (stockssq.nextval,'WIPR','Wipro services Ltd', 'Wipro India Services Ltd.Regd');
insert into stocks values (stockssq.nextval,'TCS','Tata Consultancy', 'Tata Consultancy India ltd');
commit;
create sequence stocks_prcsq
So now Stockid 2 will be infosys,3 TCS and 4 Wipro
insert into stocks_prc (PRC_ID,STOCKS_ID,PRICE,BUSINESS_DT)
select stocks_prcsq.nextval,2,0.34*level,to_date('1-Jan-2013') + level from
dual connect by level < 10;
insert into stocks_prc (PRC_ID,STOCKS_ID,PRICE,BUSINESS_DT)
select stocks_prcsq.nextval,3,0.11*level,to_date('1-Jan-2013') + level from
dual connect by level < 10;
insert into stocks_prc (PRC_ID,STOCKS_ID,PRICE,BUSINESS_DT)
select stocks_prcsq.nextval,4,0.313*level,to_date('1-Jan-2013') + level from
dual connect by level < 10;
commit;
Now this data is dense, which means we have data for all days and no missing data.
Lets delete some days of data to make it sparse for demonstration purpose:
delete from stocks_prc where stocks_id=3 and business_dt ='7-Jan-2013';
Now for Infosys stock we dont have data for 5 Jan and for TCS , 7th Jan data is missing.
Lets try to get the missing data as a first step.
SELECT PRC_ID,STOCKS_ID,PRICE,ALL_DATE
FROM business_dt_tab ,
(SELECT * FROM stocks_prc where stocks_id=2
)
WHERE all_date = business_dt(+)
order by ALL_DATE;
PRC_ID
|
STOCKS_ID
|
PRICE
|
ALL_DATE
|
2
|
2
|
0.34
|
02-01-13 12:00
|
3
|
2
|
0.68
|
03-01-13 12:00
|
4
|
2
|
1.02
|
04-01-13 12:00
|
NULL
|
NULL
|
NULL
|
05-01-13 12:00
|
6
|
2
|
1.7
|
06-01-13 12:00
|
7
|
2
|
2.04
|
07-01-13 12:00
|
8
|
2
|
2.38
|
08-01-13 12:00
|
9
|
2
|
2.72
|
09-01-13 12:00
|
But we dont have Stocks_ID info, without which data is not meaningful.
Hence we need to add the partitioned outer join.
Such a join extends the conventional outer join syntax by applying the outer join to each logical partition defined in a query.
Oracle logically partitions the rows in your query based on the expression you specify in the PARTITION BY clause.
The result of a partitioned outer join is a UNION of the outer joins of each of the partitions in the logically partitioned table
with the table on the other side of the join.
SELECT STOCKS_ID,PRICE,ALL_DATE
FROM stocks_prc PARTITION BY(stocks_id) right outer join
business_dt_tab
on (all_date = business_dt)
order by ALL_DATE;
From this, we will get the Stocks_id also, thats the speciality of Parition by clause here.
STOCKS_ID
|
PRICE
|
ALL_DATE
|
2
|
0.34
|
02-01-13 12:00
|
2
|
0.68
|
03-01-13 12:00
|
2
|
1.02
|
04-01-13 12:00
|
2
|
NULL
|
05-01-13 12:00
|
2
|
1.7
|
06-01-13 12:00
|
2
|
2.04
|
07-01-13 12:00
|
Now lets try to fill previous days data using the LAG analytic function.
SELECT STOCKS_ID,
nvl(PRICE,lag(price) over (partition by stocks_id order by all_date)) price
,ALL_DATE
FROM stocks_prc PARTITION BY(stocks_id) right outer join
business_dt_tab
on (all_date = business_dt);
STOCKS_ID
|
PRICE
|
ALL_DATE
|
2
|
0.34
|
02-01-13 12:00
|
2
|
0.68
|
03-01-13 12:00
|
2
|
1.02
|
04-01-13 12:00
|
2
|
1.02
|
05-01-13 12:00
|
2
|
1.7
|
06-01-13 12:00
|
2
|
2.04
|
07-01-13 12:00
|
2
|
2.38
|
08-01-13 12:00
|
References:
very impressive
ReplyDeletewhat happens when there is more that one null row. your data was null on 5/1. But what if it was null on 6/1 as well. I don't think your approach will work. because lag(null) will return null on the 2nd row.
ReplyDeleteBetter approach is to use "LAST_VALUE" function rather than LAG.
ReplyDelete