There are situations where approximate results are superior than exact results. Typically, this is the case when two conditions are met. First, when the time and/or resources needed to produce exact results are much higher than for approximate results. Second, when approximate results are good enough. Approximate results are for example superior in case of exploratory queries or when results are displayed in a visual manner that doesn’t convey small differences.
Version 12.1.0.2 includes a single function related to approximate query processing: APPROX_COUNT_DISTINCT (I already wrote about it in The APPROX_COUNT_DISTINCT Function – A Test Case). However, version 12.2 introduces not only a number of functions and related functionalities (like the support in materialized views), but also the approximate aggregate transformation that allows an application to take advantage of approximate query processing without requiring code changes.
The purpose of approximate aggregate, which is a heuristic-based query transformation, is to allow applications, without modification, to take advantage of approximate query processing. In other words, to let the query optimizer transform functions returning exact results to functions returning approximate results. Specifically, it can carry out the following transformations:
- “COUNT(DISTINCT <expr>)” to “APPROX_COUNT_DISTINCT(<expr>)”
- “MEDIAN(<expr>)” to “APPROX_PERCENTILE(0.5) WITHIN GROUP (ORDER BY <expr>)”
- “PERCENTILE_CONT(<expr>) WITHIN GROUP (ORDER BY <expr>)” to “APPROX_PERCENTILE(<expr>) WITHIN GROUP (ORDER BY <expr>)”
- “PERCENTILE_DISC(<expr>) WITHIN GROUP (ORDER BY <expr>)” to “APPROX_PERCENTILE(<expr>) WITHIN GROUP (ORDER BY <expr>)”
A restriction is that approximate aggregate doesn’t take place for the analytic version of the COUNT, MEDIAN, PERCENTILE_CONT and PERCENTILE_DISC functions. In other words, the query optimizer isn’t able to take advantage of approximate aggregate if the OVER clause is specified.
The following example illustrates (notice that the test query uses the MEDIAN function):
- Setup test environment
SQL> execute dbms_random.seed(0) SQL> CREATE TABLE t 2 AS 3 SELECT rownum AS id, 4 trunc(dbms_random.normal*1000) AS n, 5 mod(rownum, 10)+1 AS p 6 FROM dual 7 CONNECT BY level <= 1000; SQL> execute dbms_stats.gather_table_stats(user,'T')
- Get the execution plan of the test query when it doesn’t take advantage of AAT (notice the SORT GROUP BY row source operation)
SQL> EXPLAIN PLAN FOR SELECT median(n) FROM t; SQL> SELECT * FROM table(dbms_xplan.display(format=>'basic')); PLAN_TABLE_OUTPUT ------------------------------------------- Plan hash value: 1476560607 ------------------------------------------- | Id | Operation | Name | ------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT GROUP BY | | | 2 | TABLE ACCESS STORAGE FULL| T | -------------------------------------------
- Get the execution plan of the same query as before, but this time when it takes advantage of AAT (notice the SORT AGGREGATE APPROX row source operation instead of SORT GROUP BY)
SQL> EXPLAIN PLAN FOR SELECT median(n) FROM t; SQL> SELECT * FROM table(dbms_xplan.display(format=>'basic')); PLAN_TABLE_OUTPUT ------------------------------------------- Plan hash value: 2966233522 ------------------------------------------- | Id | Operation | Name | ------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT AGGREGATE APPROX | | | 2 | TABLE ACCESS STORAGE FULL| T | -------------------------------------------
Since approximate aggregate changes the result set returned by queries, by default it is disabled. If you want to take advantage of it, you have to explicitly enable it. There are three initialization parameters that controls approximate aggregate:
- APPROX_FOR_COUNT_DISTINCT controls whether the query optimizer transforms a COUNT function to an APPROX_COUNT_DISTINCT function. By default, the initialization parameter is set to FALSE. You can enable the query transformation at the system or session level by setting it to TRUE.
- APPROX_FOR_PERCENTILE controls whether the query optimizer applies transformations that result in the use of the APPROX_PERCENTILE function. By default, the initialization parameter is set to NONE. You can enable the query transformation at the system or session level by setting APPROX_FOR_PERCENTILE to either PERCENTILE_CONT, PERCENTILE_DISC or ALL. If it’s set to PERCENTILE_CONT, the transformation takes place for the MEDIAN and PERCENTILE_CONT functions. If it’s set to PERCENTILE_DISC, the transformation takes place for the PERCENTILE_DISC function. If it’s set to ALL, the transformation takes place for all three functions.
- APPROX_FOR_AGGREGATION controls whether approximate aggregate is enabled. By default, it is set to FALSE. You can enable the query transformation by setting it to TRUE. Note that both APPROX_FOR_COUNT_DISTINCT and APPROX_FOR_PERCENTILE override the value of this initialization parameter.
Finally, for the example seen before, let’s have a look to what an optimizer trace contains (generated on the Oracle Database Exadata Express Cloud Service):
- Query transformation disabled (or not possible)
AAT: Considering Approximate Aggregate Transformation on query block SEL$1 (#0) ******************************************* Approximate Aggregate Transformation (AAT) ******************************************* AAT: no exact aggregates transformed
- Query transformation takes place
AAT: Considering Approximate Aggregate Transformation on query block SEL$1 (#0) ******************************************* Approximate Aggregate Transformation (AAT) ******************************************* AAT: transformed final query ******* UNPARSED QUERY IS ******* SELECT APPROX_PERCENTILE(0.500000) WITHIN GROUP ( ORDER BY "T"."N") "MEDIAN(N)" FROM "PDB_ADMIN"."T" "T"