IslandSQL Episode 5: Select in Oracle Database 23c

Introduction

In the last episode, we extended the expressions in the IslandSQL grammar to complete the lock table statement. The grammar now fully covers expressions, conditions and the select statement. In this episode, we will focus on optimizer hints and new features in the Oracle Database 23c that can be used in the select statement.

The full source code is available on GitHub, the binaries are on Maven Central and this VS Code extension uses the IslandSQL library to find text in DML statements and report syntax errors.

Token Channels

ANTLR uses the concept of channels which is based on the idea of radio frequencies. The lexer is responsible for identifying tokens and putting them on the right channel.

For most lexers, these two channels are enough:

DEFAULT_CHANNEL – all tokens that are relevant to the parser
HIDDEN_CHANNEL – all other tokens

Here’s an example:

1) Tokens in source, on hidden channel and on default channel

select█/*+ full(emp) */█*█from█emp█where█empno█=█7788█;
      █/*+ full(emp) */█ █    █   █     █     █ █    █
select                  * from emp where empno = 7788 ;

The first line contains the complete statement where a space token is represented as █ . The syntax highlighting helps to identify the 19 tokens. In the second line, you find all 10 hidden tokens – comments and whitespace. The noise, so to speak. And in the third line are the visible 9 tokens on the default channel.

This is similar to a noise-cancelling system. The parser only gets the tokens that are necessary to do its job.

Identifying Hints

In this blog post, I explained how you can distinguish hints from ordinary comments and highlight them in SQL Developer. Solving this problem was a bit more complicated because SQL Developer’s parse tree does not contain hints. Because hints are just special comments.

However, in the IslandSQL grammar, we want to define hints as part of a query_block. In other words, we want to make them visible.

query_block in IslandSQL with highlighted hint

In the Lexer?

Identifying hints in the lexer and putting them on the DEFAULT_CHANNEL sounds like a good solution. However, we do not want to handle comment tokens that look like a hint in every position in the parser. This would be a nightmare. To avoid that we could add a semantic predicate to consider only hint-style comments following the select keyword. Of course, we need to ignore whitespace and ordinary comments. Furthermore, we have to ensure that the select keyword is the start of a query_block and not used in another context such as a grant statement.

At that point, it becomes obvious that the lexer would be doing the job of the parser.

Better in the Parser!

We better use the lexer only to identify hint tokens and put them on the HIDDEN_CHANNEL:

2) Excerpt IslandSqlLexer.g4 v0.5.0

ML_HINT: '/*+' .*? '*/' -> channel(HIDDEN);
ML_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
SL_HINT: '--+' ~[\r\n]* -> channel(HIDDEN);
SL_COMMENT: '--' ~[\r\n]* -> channel(HIDDEN);

And then we define a semantic predicate in the parser:

3) Excerpt IslandSqlParser.g4 v0.5.0

queryBlock:
    {unhideFirstHint();} K_SELECT hint?
    queryBlockSetOperator?
    selectList
    (intoClause | bulkCollectIntoClause)? // in PL/SQL only
    fromClause? // starting with Oracle Database 23c the from clause is optional
    whereClause?
    hierarchicalQueryClause?
    groupByClause?
    modelClause?
    windowClause?
;

That’s the call of the function unhideFirstHint();} on line 161. At that point, the parser is at the position of the token K_SELECT. Here’s the implementation in the base class of the generated parser:

4) Excerpt IslandSqlParserBase.java v0.5.0

    public void unhideFirstHint() {
        CommonTokenStream input = ((CommonTokenStream) this.getTokenStream());
        List<Token> tokens = input.getHiddenTokensToRight(input.index());
        if (tokens != null) {
            for (Token token : tokens) {
                if (token.getType() == IslandSqlLexer.ML_HINT || token.getType() == IslandSqlLexer.SL_HINT) {
                    ((CommonToken) token).setChannel(Token.DEFAULT_CHANNEL);
                    return; // stop after first hint style comment
                }
            }
        }
    }

We scan all hidden tokens to the right of the keyword select and set the first hint token to the DEFAULT_CHANNEL to make it visible to the parser.

Parse Tree

Let’s visualise the parse tree of the following query:

5a) Example with comments and hints

select -- A query_block can have only one comment
      /*  containing hints, and that comment must
          follow the SELECT keyword. */
      /*+ full(emp) */
      --+ index(emp)
      ename, sal    -- select_list
 from emp           -- from_clause
where empno = 7788; -- where_clause

We use ParseTreeUtil.dotParseTree to produce an output in DOT format and paste the result into the web UI of Edotor or any other Graphviz viewer to produce this result:

The leave nodes are sand-coloured rectangles. They represent the visible lexer tokens, the ones on the DEFAULT_CHANNEL. All other nodes are sky blue and elliptical. They represent a rule in the parser grammar.

I have changed the colour of the hint node to red so that you can spot it more easily. You see that it contains the /*+ full(emp) */ hint-style comment. All other comments are not visible in the parse tree. That’s what we wanted.

Here’s an alternative textual representation of the parse tree using ParseTreeUtil.printParseTree. It is better suited to represent larger parse trees. Furthermore, it contains also the symbol name of lexer tokens, for example K_SELECT or ML_HINT as you see in lines 7 and 9.

5b) Parse tree

file
  dmlStatement
    selectStatement
      select
        subquery:subqueryQueryBlock
          queryBlock
            K_SELECT:select
            hint
              ML_HINT:/*+ full(emp) */
            selectList
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:ename
              COMMA:,
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:sal
            fromClause
              K_FROM:from
              fromItem:tableReferenceFromItem
                tableReference
                  queryTableExpression
                    sqlName
                      unquotedId
                        ID:emp
            whereClause
              K_WHERE:where
              condition
                expression:simpleComparisionCondition
                  expression:simpleExpressionName
                    sqlName
                      unquotedId
                        ID:empno
                  simpleComparisionOperator:eq
                    EQUALS:=
                  expression:simpleExpressionNumberLiteral
                    NUMBER:7788
      sqlEnd
        SEMI:;
  <EOF>

New Features in the Oracle Database 23c

The Oracle Database 23c comes with a lot of new features. See the new features guide for a complete list.

In the next chapters, we look at a few examples that are relevant when querying data. In other words, at some of the new features that are applicable in the select statement.

Graph Table Operator

You can use the new graph_table operator to query property graphs in the Oracle Database. It’s a table function similar to xml_table or json_table. A powerful addition to the converged database.

Setup

The SQL Language Reference 23 provides some good examples including a setup script.

6) Setup example property graph

The example property graph looks like this:

Source: SQL Language Reference 23c

Query

7a) Query using graph_table

select a_name, b_name, c_name
  from graph_table (
          students_graph
          match
             (a is person)
                -[is friends]->    -- a is friend of b
             (b is person)
                -[is friends]->    -- b is friend of c
             (c is person)
                -[is friends]->    -- c is friend of a (cyclic path)
             (a)
          where
             a.name = 'Mary'       -- start of cyclic path with 3 nodes
          columns (
             a.name as a_name,
             b.name as b_name,
             c.name as c_name
          )
       ) g;

A_NAME     B_NAME     C_NAME    
---------- ---------- ----------
Mary       John       Bob

Edges and Directions

An edge has a source and a destination vertex. According to the model, Mary is a friend of John and this means that John is also a friend of Mary. When we change the direction of the edges in the query from -[is friends]-> to <-[is friends]- the query result changes to:

A_NAME     B_NAME     C_NAME    
---------- ---------- ----------
Mary       Bob        John

We’ve got now the clockwise result of the cyclic path starting with Mary (see the highlighted person vertices in the STUDENTS_GRAPH figure above).

Since there is only one type of edge between the vertices of the type persons we get the same result by using just <-[]- or even <-.

To ignore the direction of a friendship we can use <-[is friends]-> or -[is friends]- or <-[]-> or -[]- or <-> or just - to produce this result:

A_NAME     B_NAME     C_NAME    
---------- ---------- ----------
Mary       Bob        John      
Mary       John       Bob

IMO this arrow-like syntax is intuitive and makes a graph_table query relatively easy to read and write.

Table Value Constructor

Instead of reading rows from a table/view, you can produce rows on the fly using the new values_clause. This makes it possible to produce rows without writing a query_block for each row and using union all as a kind of row separator.

8a) Query using table value constructor

column english format a7
column german  format a7
with
   eng (digit, english) as (values
      (1, 'one'),
      (2, 'two')
   )
 select digit, english, german
   from eng e
natural full join (values
           (2, 'zwei'),
           (3, 'drei')
        ) as g (digit, german)
  order by digit;
/

     DIGIT ENGLISH GERMAN 
---------- ------- -------
         1 one            
         2 two     zwei   
         3         drei

JSON_ARRAY Constructor by Query

The function json_array has got a new JSON_ARRAY_query_content clause. This clause simplifies the creation of JSON documents, similar to SQL/XML. If you use the abbreviated syntax for json_array and json_object it feels like writing JSON documents with embedded SQL.

9a) Query using JSON_ARRAY_query_contant clause

column result format a90
select json [
          select json {
                    'ename': ename,
                    'sal': sal,
                    'comm': comm absent on null
                 }
            from emp
           where sal >= 3000
          returning json
       ] as result;

RESULT                                                                                    
------------------------------------------------------------------------------------------
[{"ename":"SCOTT","sal":3000},{"ename":"KING","sal":5000},{"ename":"FORD","sal":3000}]

SQL Boolean Data Type

Where can the new Boolean data type be used in the select statement? In conversion functions, for example.

10a) Query using Boolean data type

column dump_yes_value format a20
select cast('yes' as boolean) as yes_value,
       xmlcast(xmltype('<x>no</x>') as boolean) as no_value,
       validate_conversion('maybe' as boolean) as is_maybe_boolean,
       dump(cast('yes' as boolean)) as dump_yes_value;

YES_VALUE   NO_VALUE    IS_MAYBE_BOOLEAN DUMP_YES_VALUE
----------- ----------- ---------------- --------------------
TRUE        FALSE                      0 Typ=252 Len=1: 1

Boolean Expressions

The impact of Boolean expressions is huge. A condition becomes an expression that returns a Boolean expression. Consequently, conditions can be used wherever expressions are permitted.

11a) Query using Boolean expressions

with
   function f(p in boolean) return boolean is
   begin
      return p;
   end;
select (select count(*) from emp) = 14 and (select count(*) from dept) = 4 as is_complete,
       f(1>0) is true as is_true,
       cast(null as boolean) is not null as is_not_null;
/

IS_COMPLETE IS_TRUE     IS_NOT_NULL
----------- ----------- -----------
TRUE        TRUE        FALSE

JSON Schema

There is an extended is_JSON_condition that makes it possible to validate a JSON document against a JSON schema.

12a) Query using JSON schema

column j format a20
with
   t (j) as (values
      (json('["a", "b"]')),            -- JSON array
      (json('{"a": "a", "b": "b"}')),  -- JSON object without id property
      (json('{"id": 42}')),            -- JSON object with numeric id property
      (json('{"id": "42"}'))           -- JSON object with string id property
   )
select j,
       j is json validate '
          {
             "type": "object",
             "properties": {
                "id": { "type": "number" }
             }
          }' as is_valid
  from t;

J                    IS_VALID
-------------------- -----------
["a","b"]            FALSE
{"a":"a","b":"b"}    TRUE
{"id":42}            TRUE
{"id":"42"}          FALSE

What Else?

There are more new features in the Oracle Database 23c that you can use in the select statement, such as:

The data quality operators fuzzy_match and phonic_encode
Dangling predicates in case expressions and case statements within plsql_declarations
New functions related to the Boolean data type: boolean_and_agg, every, boolean_or_agg and to_boolean
New functions related to domains: domain_check, domain_check_type, domain_display, domain_name and domain_order
New JSON_passing_clause and type clause in json_query and json_value
New on null and on error clauses in json_scalar
New ordered clause in json_serialize
New type clause in json_table and json_transform
New operations sort, nested_path, case, copy, intersect, merge, minus, prepend and union in json_transform

We can also assume that more features will be added with future release updates. The AI vector search, for example, should be available with 23.4 later this year.

Outlook

The plan for IslandSQL is still the same as outlined in the previous episode. So we should cover the remaining DML statements (call, delete, explain plan, insert, merge and update) in the next episode.

Introduction

Token Channels

Identifying Hints

In the Lexer?

Better in the Parser!

Parse Tree

New Features in the Oracle Database 23c

Graph Table Operator

Setup

Query

Edges and Directions

Table Value Constructor

JSON_ARRAY Constructor by Query

SQL Boolean Data Type

Boolean Expressions

JSON Schema

What Else?

Outlook

Recommend

神马股份拟投资2.52亿元设泰国生产基地

OPPO Find X7全面评测标准版居然强过Pro？

向大师致敬，OPPO Find X7 Ultra大师模式组图赏析

Ford cuts production of F-150 Lightning pickup on weaker-than-expected electric...

Virtual reality helps NHS doctors spot signs of sepsis

搭载Wear OS的一加Watch 2将在下个月的MWC上亮相

DPD error caused chatbot to swear at customer

Speed Up Your Ruby on Rails Application with LiteCache

What is behind the TikTok thirst for Stanley water cups?

Markon Review Express Weekly #22

About Joyk