Google Search

Saturday, November 30, 2013

Oracle SQL Analytic functions : NTILE and WIDTH_BUCKET

Lets try to learn about Analytic functions NTILE and WIDTH_BUCKET

NTILE Anlytic Function:

NTILE is an analytic function.
It divides ordered data set into number of buckets and assigns appropriate bukcet number to each row.
It can be used to divide rows into equal sets and assign a number to each row.

Lets take all_objects view, and group them into balanced sets and assign bucket number to each.

select object_id, ntile(8) over (order by object_id) group_id
from all_objects where rownum<20;
Output will be as below:

OBJECT_ID group_id
2 1
3 1
4 1
5 2
6 2
7 2
8 3
9 3
10 3
11 4

As we can see,oracle has grouped rows equally and assigned numbers for each row (column group_id)

WIDTH_BUCKET Function:
This is similar to NTILE function, but here we can actually supply the range of values (start and end values),
It takes the raanges and splits it into N groups.

   
select object_id,
width_bucket( object_id, 1, 20, 3 ) width_bucket,
ntile(8) over (order by object_id) ntile
from all_objects where rownum<20;

OBJECT_ID WIDTH_BUCKET    NTILE
2 1 1
3 1 1
4 1 1
5 1 2
6 1 2
7 1 2
8 2 3
9 2 3
10 2 3
11 2 4
12 2 4
13 2 5
14 3 5
15 3 6
16 3 6
As we can see from above output. NTILE can be used to get even distribution and WIDTH_BUCKET can be used for user provided range of values.

Some useful links:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:47910227585839
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:30966075177614

http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions214.htm

http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions101.htm


Monday, September 30, 2013

Sql query to fill gaps (partitioned outer join)

One of the common scenario in any application would be existing of sparse data and need to fill the missing data.
Below example would show how to findout gaps and fill the data from previous day values.

Lets consider two tables, Stocks (Dimension) and Stock_PRC (Fact Table)

create table stocks(
id number,
ticker varchar2(10)
,company_name varchar2(100)
,descr varchar2(200)
);

create table stocks_prc(
prc_id number
,stocks_id number
,price number
,business_dt date
);

stocks_id has a forign key reference to id column of stocks table.
Let fill some data.

create sequence stockssq;



insert into stocks values (stockssq.nextval,'INFY','Infosys India Ltd', 'regdInfosys India Ltd.Regd');
insert into stocks values (stockssq.nextval,'WIPR','Wipro services Ltd', 'Wipro India Services Ltd.Regd');
insert into stocks values (stockssq.nextval,'TCS','Tata Consultancy', 'Tata Consultancy India ltd');
commit;

create sequence stocks_prcsq
So now Stockid 2 will be infosys,3 TCS and 4 Wipro


insert into stocks_prc (PRC_ID,STOCKS_ID,PRICE,BUSINESS_DT)
select stocks_prcsq.nextval,2,0.34*level,to_date('1-Jan-2013') + level from
dual connect by level < 10;

insert into stocks_prc (PRC_ID,STOCKS_ID,PRICE,BUSINESS_DT)
select stocks_prcsq.nextval,3,0.11*level,to_date('1-Jan-2013') + level from
dual connect by level < 10;

insert into stocks_prc (PRC_ID,STOCKS_ID,PRICE,BUSINESS_DT)
select stocks_prcsq.nextval,4,0.313*level,to_date('1-Jan-2013') + level from
dual connect by level < 10;

commit;
Now this data is dense, which means we have data for all days and no missing data.


Lets delete some days of data to make it sparse for demonstration purpose:
delete from stocks_prc where stocks_id=3 and business_dt ='7-Jan-2013';

Now for Infosys stock we dont have data for 5 Jan and for TCS , 7th Jan data is missing.
Lets try to get the missing data as a first step.

SELECT PRC_ID,STOCKS_ID,PRICE,ALL_DATE
FROM business_dt_tab ,
  (SELECT * FROM stocks_prc where stocks_id=2
  )
WHERE all_date = business_dt(+)
order by ALL_DATE;

PRC_ID
STOCKS_ID
PRICE
ALL_DATE
2
2
0.34
02-01-13 12:00
3
2
0.68
03-01-13 12:00
4
2
1.02
04-01-13 12:00
NULL
NULL
NULL
05-01-13 12:00
6
2
1.7
06-01-13 12:00
7
2
2.04
07-01-13 12:00
8
2
2.38
08-01-13 12:00
9
2
2.72
09-01-13 12:00


But we dont have Stocks_ID info, without which data is not meaningful.
Hence we need to add the partitioned outer join.
Such a join extends the conventional outer join syntax by applying the outer join to each logical partition defined in a query.
Oracle logically partitions the rows in your query based on the expression you specify in the PARTITION BY clause. 
The result of a partitioned outer join is a UNION of the outer joins of each of the partitions in the logically partitioned table 
with the table on the other side of the join.


SELECT STOCKS_ID,PRICE,ALL_DATE
FROM stocks_prc PARTITION BY(stocks_id) right outer join 
  business_dt_tab 
on (all_date = business_dt)
order by ALL_DATE;

From this, we will get the Stocks_id also, thats the speciality of Parition by clause here.

STOCKS_ID
PRICE
ALL_DATE
2
0.34
02-01-13 12:00
2
0.68
03-01-13 12:00
2
1.02
04-01-13 12:00
2
 NULL
05-01-13 12:00
2
1.7
06-01-13 12:00
2
2.04
07-01-13 12:00

Now lets try to fill previous days data using the LAG analytic function.

SELECT STOCKS_ID,
nvl(PRICE,lag(price) over (partition by stocks_id order by all_date)) price
,ALL_DATE
FROM stocks_prc PARTITION BY(stocks_id) right outer join 
  business_dt_tab 
on (all_date = business_dt);

STOCKS_ID
PRICE
ALL_DATE
2
0.34
02-01-13 12:00
2
0.68
03-01-13 12:00
2
1.02
04-01-13 12:00
2
1.02
05-01-13 12:00
2
1.7
06-01-13 12:00
2
2.04
07-01-13 12:00
2
2.38
08-01-13 12:00

References:



Monday, August 26, 2013

Oracle 11G SQL Recursive Query , Fibonacci Series

One of the new features of Oracle 11G is Recursive Subquery Factoring (With Clause).

It is of the below format:

with query_name(col_alias1,col_alias2)
as
(
/*Below subquery_factoring_clause, Must contain 2 Query blocks : Anchor Member and Recursive Member */
select  col_alias1 , col_alias2 from dual ===> Anchor Member
==> You must combine the recursive member with the anchor member using the UNION ALL set operator
UNION ALL
select  col_alias1 , col_alias2 from query_name ===> Recursive Member
) select * from query_name

Few Rules:
1. You must combine the recursive member with the anchor member using the UNION ALL set operator
2. The anchor member must appear before the recursive member, and it cannot reference query_name.
3. The number of column aliases following WITH query_name and the number of columns in the SELECT lists of the anchor and recursive query blocks must be the same.

Example- Fibonacci Series:

with tdata(idx,fb,prev)
as
(select 1 idx, 0 fb ,cast(null as number) prev from dual
union all
select f1.idx+1 idx,(nvl(f1.prev,1) + f1.fb) fb,f1.fb prev from tdata f1
where f1.idx < 10
)
select fb from tdata order by fb;


References:

http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_10002.htm

http://www.plsqlchallenge.com/pls/apex/f?p=10000:659:6978114797747::NO:659:P659_COMP_EVENT_ID,P659_QUIZ_ID,P659_QUESTION_ID:175266,,8032&cs=16C7A6293B4B41E088AF6F90A13F58D4B

Wednesday, July 31, 2013

Oracle collect Stats : Global Table Stats vs Partitioned Stats DBMS_STATS

Let us see the differences between Global Table stats and Partitioned Table stats.
and when they are used by Optimizer.

We can collect stats for a particular partition or for complete table.
Depending on the query the optimizer uses either one of the statistics.

Lets see some scenario.

1. When Partition level statistics are used?
When we query a partition table which would hit a particular partition.
Or In other words Partition level statistics are used when explain plan of the query would have explicit Start and Stop keys AND Start Key = Stop Keys.
Only in this scenario Partition level stats are used.

explain plan for
select * from PART_TAB
where hire_date = to_date('1-Jan-2013','dd-Mon-yyyy');

|   0 | SELECT STATEMENT       |          |     1 |    11 |     3   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|          |     1 |    11 |     3   (0)| 00:00:01 |     2 |     2 |
|*  2 |   TABLE ACCESS FULL    | PART_TAB |     1 |    11 |     3   (0)| 00:00:01 |     2 |     2 |


2. When Table level statistics are used?

When we query a partition which span over a range of partition (more than one partition), then
table level stats are used.
In explain plan if you see the partition start and stop keys which are not same OR if its says KEY then table partitioned are used.

Example:
select * from PART_TAB
where hire_date >= to_date('1-Jan-2013','dd-Mon-yyyy');

|   0 | SELECT STATEMENT         |          |     9 |    99 |     6   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|          |     9 |    99 |     6   (0)| 00:00:01 |     2 |     4 |
|   2 |   TABLE ACCESS FULL      | PART_TAB |     9 |    99 |     6   (0)| 00:00:01 |     2 |     4 |

In the above scenario table level stats are used.
More Info below:


For large tables we can make use of Incremental Statistics.
More information below:






Thursday, May 16, 2013

Row source generation

Below are different ways of row source generation, can used to create test data or also some cases in joins.

Lets consider that we need to generate 10 rows starting from 101-105 and 201 to 205


1. select case
          when level <= 5 then 100 + level
                          else 195 + level
       end id
  from dual
connect by level <= 10
 order by id

2.

select level id
  from dual
 where level between 101 and 105
    or level between 201 and 205
connect by level <= 205
 order by id

3.

select 100 * x.x + y.y id
  from (
   select level x
     from dual
   connect by level <= 2
       ) x
 cross join (
   select level y
     from dual
   connect by level <= 5
       ) y
 order by id

4.

select case
          when rownum <= 5 then 100 + rownum
                           else 195 + rownum
       end id
  from dual
connect by rownum <= 10
 order by id

5.

select to_number(column_value) id
  from xmltable('101,102,103,104,105,201,202,203,204,205')
 order by id

6.

select to_number(column_value) id
  from xmltable('101 to 105, 201 to 205')
 order by id

7.

select to_number(column_value) id
  from xmltable('for $i in (100, 200), $j in 1 to 5 return $i + $j')
 order by id

8.

with rowsource(id) as (
   select 1 id
     from dual
    union all
   select id + 1 id
     from rowsource
    where id < 205
)
select id
  from rowsource
 where id between 101 and 105
    or id between 201 and 205
 order by id

9.

with rowsource(id) as (
   select 101 id
     from dual
    union all
   select case id
             when 105 then 201
             else id + 1
          end id
     from rowsource
    where id < 205
)
select id
  from rowsource
 order by id

Courtesy: (PL/SQL Challenge by Steven Feurestien)

http://www.plsqlchallenge.com/pls/apex/f?p=10000:659:5558871037305::NO:659:P659_COMP_EVENT_ID,P659_QUIZ_ID,P659_QUESTION_ID:140896,,7589&cs=1CADD4B82BF7B20DE4520099D400CACB9













Wednesday, March 27, 2013

SQL*Plus settings

Below are list of SQL*Plus commands that can be useful sometimes.
One major advantage of SQL*Plus is for testing, since this is not changed and is same from release to release, we can use as a tool to reproduce any testing.

Instead of executing below commands every time we login to sql*plus, we can add all below commands
into login.sql file (at product\11.2.0\dbhome_1\sqlplus\admin ) so that they are executed automatically everytime when a user starts SQL*Plus, or uses the SQL*Plus CONNECT command.

Below settings are picked up from Tom Kyte's (http://www.facebook.com/pages/Tom-Kyte-Ask-Tom-fan-page/189391211108914)  book "Effective Oracle by Design"

Just copy paste the below in login.sql /glogin.sql

REM turn off the terminal output - make it so SQLPLUS doesnot
REM print anything when we log in
set termout off

REM default your editor here. SQLPlus has many individual settings
REM This is one of the most important ones
define _editor=vi

REM serveroutput controls whether your DBMS_OUTPUT.PUT_LINE calls
REM go into the bit bucket (serveroutput off) or get displayed
REM on screen. The format wrapped  elements causes
REM sqlplus to preserve leading white space - very useful

set serveroutput on size 1000000 format wrapped

REM Here I set some default column widths for commonly queried columns

column object_name format a30
column segment_name a30
column file_name format a40
column name format a30
column file_name format a30
column what format a30 word_wrapped
column plan_plus_exp format a100

REM by default, a spool file is a fixed width file with lots of
REM trailing blanks. Trimspool removes these trailing balnks
REM making the spool file significantly smaller

set trimspool on

REM LONG controls how much of a LONG or CLOB sqlplus displays by default.
REM It defaults to 80 characters which in general is far too small.

set long 5000

REM This sets the default width at which sqlplus wraps the output.

set linesize 131

REM sqlplus prints columns headings every N lines of output this defaults to 14 lines.

set pagesize 9999

REM here is how we set my signature prompt in sqlplus to
REM username@database> Here we use NEW_VALUE concept to format a nice prompt string
REM that defaults to IDLE

define gname=idle
column global_name new_value gname 
select lower(user) || '@' ||
substr(global_name,1,decode(dot,0,length(global_name),dot-1)) global_name
from (select global_name,instr(global_name,'.') dot
from global_name);
set sqlprompt '&gname>  '

REM and lastly, we'll put termout back on sqlplus prints
REM to the screen

set termout on

Thursday, February 28, 2013

DBMS_APPLICATION_INFO

DBMS_APPLICATION_INFO is a Oracle supplied package which can be helpful in debugging, to track the progress of a long running procedure or to set any useful information in v$session view, which can be useful for Programmers, DBAs.

Consider a scenario where you are executing some statements (DML) in a loop. Lets assume that the loop iterations are more than 10 million. Now if something fails (say unique constraint error) in middle of loop, it becomes difficult to get to the exact record due to which it failed.

One way is to use DBMS_OUTPUT for each iteration, but that would fill up the stack.
Another way is use autonomous error logging procedure which would write to a separate error table. But this is not efficient way of doing since performance will slow down and also the unnecessarily we are filling up the error table.

In situations like these, Application_Info package comes handy.
This package has a procedure called SET_CLIENT_INFO, this will set values in column CLIENT_INFO of the table v$session. This value will be immediately visible to other sessions and we need not even commit.

Psuedo Code:

Session1:

for rec in (some select which has 10 million records)
loop
-- ...Some calculations
DBMS_APPLICATION_INFO.SET_CLIENT_INFO('Program is executing - ' ||  rec.col1 || '-' || rec.col2);
--- DMLs
end loop;

When the above program is running, we can query v$session to monitor the progress.

select client_info from v$session where sid := :x;

It can also be used to set the Bind variable values being used in a dynamic query.

The other procedures inside the package are below:
SET_MODULE/GET_MODULE : This allows you to set MODULE and ACTION columns in v$session.
      MODULE column should be name of the Process
       ACTION column should be set to name of the procedure you are executing in a Package.

SET_SESSION_LONGOPS : use this to monitor progress of any long running routine. Set the value for any routine which will take more than few seconds to.



Monday, January 28, 2013

Faster Inserts



Lets consider an application which has large number of inserts into a table having primary key.
Generally primary key values are generated by sequences.
When we have thousands of concurrent inserts (or parallel inserts) using sequences,  all the values goto the right hand side of the Unique Index
causing massive contention for right hand side block, hence slowing down the whole process.

Previously the solution for this was:
1. Reverse Key Indexes : Good for small primary key values but it is noticed that performance decreases for large numbers.
2. HASH Partitioning : Performance is not upto expectation in RAC clustered environment.

Best solution for the above problem was suggested by Tom Kyte (https://www.facebook.com/pages/Tom-Kyte-Ask-Tom-fan-page/189391211108914)
in Real World Performance India (Sangam12)
Which is to code the sequence to ensure there is no index contention.
We shall create a new sequence structure as Instance# || SessionsId || Sequence
The above will be inserted into Primary Key column instead of just sequence value.


Below is a demo:

create table TEST_INSERTS( COL1 number primary key, DESCR VARchar2(200) );
CREATE SEQUENCE SEQINS;

--First lets do the old way using only sequences

begin
for i in 1..10000000
loop
INSERT INTO TEST_INSERTS VALUES (SEQINS.nextval,'Using SEQINS only');
end loop;
commit;
end;
/

-- New Method for optimal scaling and response time characteristics
begin
for i in 1..10000000
loop
/*
iF YOUR DATABASE is RAC clustered then use as below
INSERT INTO TEST_INSERTS VALUES (sys_context('USER_ENV','INSTANCE')|| sys_context('USER_ENV','SID')|| SEQINS.nextval,'Using SEQINS only');
*/
INSERT INTO TEST_INSERTS VALUES (sys_context('USER_ENV','SID')|| SEQINS.nextval,'Using Composite SEQINS');
end loop;
commit;
end;
/

 More Information below

http://asktom.oracle.com/pls/apex/f?p=100:11:0::NO::P11_QUESTION_ID:6077843300346542181