adjust column high and low value on column with hybrid histograms

On table where time series data get loaded it’s not uncommon to get out-of-range conditions for the CBO optimizer.
This means that the statistics are not uptodate and until they are gathered again, the optimizer may think that there is no data within a daterange, since the queries range may be higher than the columns high value.
As consequence you may end up with suboptimal plan which could include catesian joins und unwanted nested loops.

An even wors condition is if you have few outliers in a date range and bind variables are used. if all your data is from the past 10 Month but you have a record from the year 0001, the optimizer may guess without histogram that your data es even distributed. if the last month are queries the guess cardinality would be by magnitute to low. this usually end as well with bad plans.

A common solution to this issues is to set the high and/or low value explicitly with DBMS_STATS.set_column_stats (Doc ID 1276174.1).

This works well if there are no histograms on the column. It may even work if you have histograms (even they may get destroyed). On time series date there is usually a high-balanced histogram. But on 12.1 or 12.2 this often changes to a hybrid histogram.

And using the old methode may end up with:
ERROR at line 1:
ORA-20001: Invalid or inconsistent input values
ORA-06512: at "SYS.DBMS_STATS", line 13570
ORA-06512: at line 23

This is mostly because the bkvals array is incorrect.

Even the details are documented in the description of DBMS_STATS it may be no so obvious how set the high value in such a case.
Here I share my example which workes with a hybird histogram.

v_distcnt NUMBER;
v_density NUMBER;
v_nullcnt NUMBER;
v_avgclen NUMBER;
v_histogram_type user_tab_columns.histogram%type;
DBMS_STATS.get_column_stats (ownname => USER,
tabname => 'MYTAB',
colname => 'DCOL',
distcnt => v_distcnt,
density => v_density,
nullcnt => v_nullcnt,
srec => srec,
avgclen => v_avgclen
-- init a datearray with the correct number of values (buckets in the histogram)
v_datevals := DBMS_STATS.DATEARRAY();
-- check if there is a hybrid histogram
select histogram INTO v_histogram_type from user_tab_columns where table_name='MYTAB' and column_name='DCOL';
IF v_histogram_type='HYBRID' then
-- copy the values from the histogram to the datearray (! this is an example for date/timestamp columns only, Julian calendar !)
for i in 1..srec.novals.count loop
v_datevals(i) := to_date(srec.novals(i),'J');
end loop;
-- reset first and last (low/high value) !! according your needs!
v_datevals(1) := to_date('01.01.2017','DD.MM.YYYY');
v_datevals(datevals.last) := trunc(sysdate) + 14;
null; -- .. do what you do in other cases, this is only an example, think before run ..
-- srec.bkvals := null;
-- if oyu wnat a freq or hybrid histogram ... else null
-- srec.bkvals := dbms_stats.numarray(srec.bkvals(1),srec.bkvals(srec.bkvals.last));

DBMS_STATS.prepare_column_values (srec, v_datevals);
DBMS_STATS.set_column_stats (ownname => USER,
tabname => 'MYTAB',
colname => 'DCOL',
distcnt => v_distcnt,
density => v_density,
nullcnt => v_nullcnt,
srec => srec,
avgclen => v_avgclen,
force => true

A hybrid histogram on skewed time series can be really wonderful and the out-of-range condition can be prevented.

Please leave comments how you handle the out-of-range issues.

Author: trieder

Born with CPP, rased with SQL, living with NVIM

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.