I have a SELECT statement that runs really slow, it's holding back our night process.
The query is: (Please don't comment about the implicit join syntax, this is automatically generated from Informatica that runs this code) :
SELECT * FROM STG_DIM_CRM_CASES,V_CRM_CASE_ID_EXISTS_IN_DWH,stg_scd_customers_key WHERE STG_DIM_CRM_CASES.CRM_CASE_ID = V_CRM_CASE_ID_EXISTS_IN_DWH.CASE_ID(+) AND STG_DIM_CRM_CASES.account_number = stg_scd_customers_key.account_number(+) and STG_DIM_CRM_CASES.Case_Create_Date between stg_scd_customers_key.start_date(+) and stg_scd_customers_key.end_date(+)
Tables info :
STG_DIM_CRM_CASES
Index - (Account_Number,Case_Create_Date) size - 270k records.
stg_scd_customers_key
Index - Account_Number,Start_Date,End_Date Partitioned - End_Date Size - 500 million records.
V_CRM_CASE_ID_EXISTS_IN_DWH(View) -
select t.case_id from crm_ps_rc_case t, dim_crm_cases x where t.case_id=x.crm_case_id;
dim_crm_cases -
Indexed - (crm_case_id) Size - 100 million .
crm_ps_rc_case -
Size - 270k records
The query without the join to stg_scd
is taking seconds, looks like it is the part that causing the performance issues, the view run in seconds too although it is being joined to a 100 Million records table. Right now the query is taking some where between 12 to 30 minutes, depends how busy our sources are.
Here is the EXECUTION PLAN :
6 | 0 | SELECT STATEMENT | | 3278K| 1297M| 559K (4)| 02:10:37 | | | | | | 7 | 1 | PX COORDINATOR | | | | | | | | | | | 8 | 2 | PX SEND QC (RANDOM) | :TQ10003 | 3278K| 1297M| 559K (4)| 02:10:37 | | | Q1,03 | P->S | QC (RAND) | 9 |* 3 | HASH JOIN OUTER | | 3278K| 1297M| 559K (4)| 02:10:37 | | | Q1,03 | PCWP | | 10 | 4 | PX RECEIVE | | 29188 | 10M| 50662 (5)| 00:11:50 | | | Q1,03 | PCWP | | 11 | 5 | PX SEND HASH | :TQ10002 | 29188 | 10M| 50662 (5)| 00:11:50 | | | Q1,02 | P->P | HASH | 12 |* 6 | HASH JOIN RIGHT OUTER | | 29188 | 10M| 50662 (5)| 00:11:50 | | | Q1,02 | PCWP | | 13 | 7 | BUFFER SORT | | | | | | | | Q1,02 | PCWC | | 14 | 8 | PX RECEIVE | | 29188 | 370K| 50575 (5)| 00:11:49 | | | Q1,02 | PCWP | | 15 | 9 | PX SEND BROADCAST | :TQ10000 | 29188 | 370K| 50575 (5)| 00:11:49 | | | | S->P | BROADCAST | 16 | 10 | VIEW | V_CRM_CASE_ID_EXISTS_IN_DWH | 29188 | 370K| 50575 (5)| 00:11:49 | | | | | | 17 |* 11 | HASH JOIN | | 29188 | 399K| 50575 (5)| 00:11:49 | | | | | | 18 | 12 | TABLE ACCESS FULL | CRM_PS_RC_CASE | 29188 | 199K| 570 (1)| 00:00:08 | | | | | | 19 | 13 | INDEX FAST FULL SCAN| DIM_CRM_CASES$1PK | 103M| 692M| 48894 (3)| 00:11:25 | | | | | | 20 | 14 | PX BLOCK ITERATOR | | 29188 | 10M| 87 (2)| 00:00:02 | | | Q1,02 | PCWC | | 21 | 15 | TABLE ACCESS FULL | STG_DIM_CRM_CASES | 29188 | 10M| 87 (2)| 00:00:02 | | | Q1,02 | PCWP | | 22 | 16 | BUFFER SORT | | | | | | | | Q1,03 | PCWC | | 23 | 17 | PX RECEIVE | | 515M| 14G| 507K (3)| 01:58:28 | | | Q1,03 | PCWP | | 24 | 18 | PX SEND HASH | :TQ10001 | 515M| 14G| 507K (3)| 01:58:28 | | | | S->P | HASH | 25 | 19 | PARTITION RANGE ALL | | 515M| 14G| 507K (3)| 01:58:28 | 1 | 2982 | | | | 26 | 20 | TABLE ACCESS FULL | STG_SCD_CUSTOMERS_KEY | 515M| 14G| 507K (3)| 01:58:28 | 1 | 2982 | | | | 27 ------------------------------------------------------------------------------------------------------------------------------------------------------------ 28 29 Predicate Information (identified by operation id): 30 --------------------------------------------------- 31 32 3 - access("STG_DIM_CRM_CASES"."ACCOUNT_NUMBER"="STG_SCD_CUSTOMERS_KEY"."ACCOUNT_NUMBER"(+)) 33 filter("STG_DIM_CRM_CASES"."CASE_CREATE_DATE">="STG_SCD_CUSTOMERS_KEY"."START_DATE"(+) AND 34 "STG_DIM_CRM_CASES"."CASE_CREATE_DATE"<="STG_SCD_CUSTOMERS_KEY"."END_DATE"(+)) 35 6 - access("STG_DIM_CRM_CASES"."CRM_CASE_ID"="V_CRM_CASE_ID_EXISTS_IN_DWH"."CASE_ID"(+)) 36 11 - access("T"."CASE_ID"="X"."CRM_CASE_ID")
Notes: Adding indexes may be an issue, depends on the index. This is not the only place this tables are being used, so indexes may interfere with other commands(Inserts mostly) on these tables.
I've also tried adding a filter on stg_scd
and excluding all the dates smaller than the minimum date in Table_Cases
, but that didn't help because it filtered only 1 year of records.
Thanks in advance.
6 Answers
Answers 1
The problem is in scanning all partitions:
18 | PX SEND HASH | :TQ10001 | 515M| 14G| 507K (3)| 01:58:28 | | | | S->P | HASH | 25 | 19 | PARTITION RANGE ALL |
| 515M| 14G| 507K (3)| 01:58:28 | 1 | 2982 | |
| | 26 | 20 | TABLE ACCESS FULL | STG_SCD_CUSTOMERS_KEY | 515M| 14G|
It happens because you are using left join to this table. Can you select 1 partition using bind variable? What is partition key? I don't see hint for parallel but according to you plan it uses parallel. Is there parallel degree on any object level? Can you remove parallel and post explain plan without parallel please?
Answers 2
I think the problem is the view, which I suspect is completely executing and returning all rows before conditions are being applied.
The overall effect of the view is to add the column CASE_ID
that is not null if CRM_CASE_ID
is found in it, null otherwise. I've replaced the view with two direct joins and a CASE expression. By replacing the convenience of the view with logic, you can join directly to each table in it and so avoid one level of join depth.
Try running this version of the query:
SELECT a.*, b.*, c.*, CASE WHEN t.case_id is not null and X.case_id is not null then t.case_id END CASE_ID FROM STG_DIM_CRM_CASES a LEFT JOIN crm_ps_rc_case t ON t.case_id = a.CRM_CASE_ID LEFT JOIN dim_crm_cases x ON x.crm_case_id = a.CRM_CASE_ID LEFT JOIN V_CRM_CASE_ID_EXISTS_IN_DWH b ON a.CRM_CASE_ID = b.CASE_ID LEFT JOIN stg_scd_customers_key c ON a.account_number = c.account_number and a.Case_Create_Date between c.start_date and stg_scd_customers_key.end_date
If you replace a.*, b.*, c.*
with only the exact columns you actually need, you'll get a speed up because there's simply less data to return. If you also put indexes on looked-up keys plus all the columns you actually select (a covering index), you will speed it up considerably, because index-only access can be used.
You should verify there are indexes in all joined-to columns as a minimum.
Answers 3
I would consider creating a materialized view (with refresh fast on demand
) for the join between Table_cases
and stg_scd
. I assume much of the work in the join is on rows that don't change from day to day.
Answers 4
THEORY: What I believe to be happening is the engine is having to resolve the 100m+ records from view join to 500m records BEFORE it applies limiting criteria (thus it creates a cross join and even if it can use indexes that's alot of records to generate then parse. So even though you wrote it as an outer join, the engine isn't able to processes it that way (I don't know why)
So at a minimum 100m*500m = 50,000m that's a lot of data to generate and then parse/limit.
By eliminating the view, the engine may be better able to optimize and use the indexes thus eliminating the need for the 50,000m record join.
Areas where I would focus my time in troubleshooting:
- Eliminate the view just to remove it as a potential overhead issue.
- Recognize no tie between stg_scd_customers_key and V_CRM_CASE_ID_EXISTS_IN_DWH exists. This means the engine may be doing a cross join BEFORE the results of STG_DIM_CRM_CASES to stg_scd_customers_key have been resolved.
CONSIDER eliminating the view, or using an inline view
Eliminating the view:
SELECT * FROM STG_DIM_CRM_CASES ,crm_ps_rc_case t ,dim_crm_cases x ,stg_scd_customers_key WHERE t.case_id=x.crm_case_id AND STG_DIM_CRM_CASES.CRM_CASE_ID = t.CASE_ID(+) AND STG_DIM_CRM_CASES.account_number = stg_scd_customers_key.account_number(+) AND STG_DIM_CRM_CASES.Case_Create_Date between stg_scd_customers_key.start_date(+) and stg_scd_customers_key.end_date(+)
using an inline view:
SELECT * FROM STG_DIM_CRM_CASES (select t.case_id from crm_ps_rc_case t, dim_crm_cases x where t.case_id=x.crm_case_id) V_CRM_CASE_ID_EXISTS_IN_DWH ,stg_scd_customers_key WHERE STG_DIM_CRM_CASES.CRM_CASE_ID = V_CRM_CASE_ID_EXISTS_IN_DWH.CASE_ID(+) AND STG_DIM_CRM_CASES.account_number = stg_scd_customers_key.account_number(+) AND STG_DIM_CRM_CASES.Case_Create_Date between stg_scd_customers_key.start_date(+) and stg_scd_customers_key.end_date(+)
As to why: - http://www.dba-oracle.com/art_hints_views.htm
While order of the where clause SHOULDN'T matter consider: On the off chase the engine is executing in the order listed, limiting the 500m down and then adding the supplemental data from the view would logically be faster.
SELECT * FROM STG_DIM_CRM_CASES,stg_scd_customers_key,V_CRM_CASE_ID_EXISTS_IN_DWH WHERE STG_DIM_CRM_CASES.account_number = stg_scd_customers_key.account_number(+) and STG_DIM_CRM_CASES.Case_Create_Date between stg_scd_customers_key.start_date(+) and stg_scd_customers_key.end_date(+) and STG_DIM_CRM_CASES.CRM_CASE_ID = V_CRM_CASE_ID_EXISTS_IN_DWH.CASE_ID(+)
Answers 5
I can propose you to use UNION
instead of Join
as it is much more efficient. You can read about that here How can I substitute a left join in sql.
You can replace Join
looking like this
SELECT DISTINCT T1.id, T1.value, T2.other_value FROM T1 LEFT OUTER JOIN T2 ON T1.id = T2.id;
By Select Union
which looks like that
SELECT T1.id, T1.value, T2.other_value FROM T1 INNER JOIN T2 ON T1.id = T2.id UNION SELECT T1.id, T1.value, NULL FROM T1 WHERE NOT EXISTS ( SELECT * FROM T2 WHERE T1.id = T2.id );
This example presents Outer join
actually but you can easily change it to your needs and it should solve your problem with efficiency.
Additionally you can try to add WITH (NOLOCK)
statement in FROM
line. Which will not lock whole table and could increase a little bit the query.
Answers 6
Try to make two SELECT
queries:
Without the join to
stg_scd
(put the result somewhere).Just
stg_scd
(put the result somewhere).
Then add a filter for the two results and select your result from both as what's matching.
0 comments:
Post a Comment