7

IslandSQL Episode 5: Select in Oracle Database 23c

 7 months ago
source link: https://www.salvis.com/blog/2024/01/12/islandsql-episode-5-select/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Introduction

In the last episode, we extended the expressions in the IslandSQL grammar to complete the lock table statement. The grammar now fully covers expressions, conditions and the select statement. In this episode, we will focus on optimizer hints and new features in the Oracle Database 23c that can be used in the select statement.

The full source code is available on GitHub, the binaries are on Maven Central and this VS Code extension uses the IslandSQL library to find text in DML statements and report syntax errors.

Token Channels

ANTLR uses the concept of channels which is based on the idea of radio frequencies. The lexer is responsible for identifying tokens and putting them on the right channel.

For most lexers, these two channels are enough:

  • DEFAULT_CHANNEL – all tokens that are relevant to the parser
  • HIDDEN_CHANNEL – all other tokens

Here’s an example:

1) Tokens in source, on hidden channel and on default channel
select█/*+ full(emp) */█*█from█emp█where█empno█=█7788█;
      █/*+ full(emp) */█ █    █   █     █     █ █    █
select                  * from emp where empno = 7788 ;

The first line contains the complete statement where a space token is represented as . The syntax highlighting helps to identify the 19 tokens. In the second line, you find all 10 hidden tokens – comments and whitespace. The noise, so to speak. And in the third line are the visible 9 tokens on the default channel.

This is similar to a noise-cancelling system. The parser only gets the tokens that are necessary to do its job.

Identifying Hints

In this blog post, I explained how you can distinguish hints from ordinary comments and highlight them in SQL Developer. Solving this problem was a bit more complicated because SQL Developer’s parse tree does not contain hints. Because hints are just special comments.

However, in the IslandSQL grammar, we want to define hints as part of a query_block. In other words, we want to make them visible.

query_block in IslandSQL with highlighted hint
In the Lexer?

Identifying hints in the lexer and putting them on the DEFAULT_CHANNEL sounds like a good solution. However, we do not want to handle comment tokens that look like a hint in every position in the parser. This would be a nightmare. To avoid that we could add a semantic predicate to consider only hint-style comments following the select keyword. Of course, we need to ignore whitespace and ordinary comments. Furthermore, we have to ensure that the select keyword is the start of a query_block and not used in another context such as a grant statement.

At that point, it becomes obvious that the lexer would be doing the job of the parser.

Better in the Parser!

We better use the lexer only to identify hint tokens and put them on the HIDDEN_CHANNEL:

2) Excerpt IslandSqlLexer.g4 v0.5.0
ML_HINT: '/*+' .*? '*/' -> channel(HIDDEN);
ML_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
SL_HINT: '--+' ~[\r\n]* -> channel(HIDDEN);
SL_COMMENT: '--' ~[\r\n]* -> channel(HIDDEN);

And then we define a semantic predicate in the parser:

3) Excerpt IslandSqlParser.g4 v0.5.0
queryBlock:
    {unhideFirstHint();} K_SELECT hint?
    queryBlockSetOperator?
    selectList
    (intoClause | bulkCollectIntoClause)? // in PL/SQL only
    fromClause? // starting with Oracle Database 23c the from clause is optional
    whereClause?
    hierarchicalQueryClause?
    groupByClause?
    modelClause?
    windowClause?
;

That’s the call of the function unhideFirstHint();} on line 161. At that point, the parser is at the position of the token K_SELECT. Here’s the implementation in the base class of the generated parser:

4) Excerpt IslandSqlParserBase.java v0.5.0
    public void unhideFirstHint() {
        CommonTokenStream input = ((CommonTokenStream) this.getTokenStream());
        List<Token> tokens = input.getHiddenTokensToRight(input.index());
        if (tokens != null) {
            for (Token token : tokens) {
                if (token.getType() == IslandSqlLexer.ML_HINT || token.getType() == IslandSqlLexer.SL_HINT) {
                    ((CommonToken) token).setChannel(Token.DEFAULT_CHANNEL);
                    return; // stop after first hint style comment
                }
            }
        }
    }

We scan all hidden tokens to the right of the keyword select and set the first hint token to the DEFAULT_CHANNEL to make it visible to the parser.

Parse Tree

Let’s visualise the parse tree of the following query:

5a) Example with comments and hints
select -- A query_block can have only one comment
      /*  containing hints, and that comment must
          follow the SELECT keyword. */
      /*+ full(emp) */
      --+ index(emp)
      ename, sal    -- select_list
 from emp           -- from_clause
where empno = 7788; -- where_clause

We use ParseTreeUtil.dotParseTree to produce an output in DOT format and paste the result into the web UI of Edotor or any other Graphviz viewer to produce this result:

parse tree with hint

The leave nodes are sand-coloured rectangles. They represent the visible lexer tokens, the ones on the DEFAULT_CHANNEL. All other nodes are sky blue and elliptical. They represent a rule in the parser grammar.

I have changed the colour of the hint node to red so that you can spot it more easily. You see that it contains the /*+ full(emp) */ hint-style comment. All other comments are not visible in the parse tree. That’s what we wanted.

Here’s an alternative textual representation of the parse tree using ParseTreeUtil.printParseTree. It is better suited to represent larger parse trees. Furthermore, it contains also the symbol name of lexer tokens, for example K_SELECT or ML_HINT as you see in lines 7 and 9.

5b) Parse tree
file
  dmlStatement
    selectStatement
      select
        subquery:subqueryQueryBlock
          queryBlock
            K_SELECT:select
            hint
              ML_HINT:/*+ full(emp) */
            selectList
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:ename
              COMMA:,
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:sal
            fromClause
              K_FROM:from
              fromItem:tableReferenceFromItem
                tableReference
                  queryTableExpression
                    sqlName
                      unquotedId
                        ID:emp
            whereClause
              K_WHERE:where
              condition
                expression:simpleComparisionCondition
                  expression:simpleExpressionName
                    sqlName
                      unquotedId
                        ID:empno
                  simpleComparisionOperator:eq
                    EQUALS:=
                  expression:simpleExpressionNumberLiteral
                    NUMBER:7788
      sqlEnd
        SEMI:;
  <EOF>

New Features in the Oracle Database 23c

The Oracle Database 23c comes with a lot of new features. See the new features guide for a complete list.

In the next chapters, we look at a few examples that are relevant when querying data. In other words, at some of the new features that are applicable in the select statement.

Graph Table Operator

You can use the new graph_table operator to query property graphs in the Oracle Database. It’s a table function similar to xml_table or json_table. A powerful addition to the converged database.

Setup

The SQL Language Reference 23 provides some good examples including a setup script.

6) Setup example property graph

The example property graph looks like this:

Data in STUDENTS_GRAPH

Source: SQL Language Reference 23c

Query
7a) Query using graph_table
select a_name, b_name, c_name
  from graph_table (
          students_graph
          match
             (a is person)
                -[is friends]->    -- a is friend of b
             (b is person)
                -[is friends]->    -- b is friend of c
             (c is person)
                -[is friends]->    -- c is friend of a (cyclic path)
             (a)
          where
             a.name = 'Mary'       -- start of cyclic path with 3 nodes
          columns (
             a.name as a_name,
             b.name as b_name,
             c.name as c_name
          )
       ) g;
A_NAME     B_NAME     C_NAME    
---------- ---------- ----------
Mary       John       Bob       
Edges and Directions

An edge has a source and a destination vertex. According to the model, Mary is a friend of John and this means that John is also a friend of Mary. When we change the direction of the edges in the query from -[is friends]-> to <-[is friends]- the query result changes to:

A_NAME     B_NAME     C_NAME    
---------- ---------- ----------
Mary       Bob        John      

We’ve got now the clockwise result of the cyclic path starting with Mary (see the highlighted person vertices in the STUDENTS_GRAPH figure above).

Since there is only one type of edge between the vertices of the type persons we get the same result by using just <-[]- or even <-.

To ignore the direction of a friendship we can use <-[is friends]-> or -[is friends]- or <-[]-> or -[]- or <-> or just - to produce this result:

A_NAME     B_NAME     C_NAME    
---------- ---------- ----------
Mary       Bob        John      
Mary       John       Bob       

IMO this arrow-like syntax is intuitive and makes a graph_table query relatively easy to read and write.

Table Value Constructor

Instead of reading rows from a table/view, you can produce rows on the fly using the new values_clause. This makes it possible to produce rows without writing a query_block for each row and using union all as a kind of row separator.

8a) Query using table value constructor
column english format a7
column german  format a7
with
   eng (digit, english) as (values
      (1, 'one'),
      (2, 'two')
   )
 select digit, english, german
   from eng e
natural full join (values
           (2, 'zwei'),
           (3, 'drei')
        ) as g (digit, german)
  order by digit;
/
     DIGIT ENGLISH GERMAN 
---------- ------- -------
         1 one            
         2 two     zwei   
         3         drei 

JSON_ARRAY Constructor by Query

The function json_array has got a new JSON_ARRAY_query_content clause. This clause simplifies the creation of JSON documents, similar to SQL/XML. If you use the abbreviated syntax for json_array and json_object it feels like writing JSON documents with embedded SQL.

9a) Query using JSON_ARRAY_query_contant clause
column result format a90
select json [
          select json {
                    'ename': ename,
                    'sal': sal,
                    'comm': comm absent on null
                 }
            from emp
           where sal >= 3000
          returning json
       ] as result;
RESULT                                                                                    
------------------------------------------------------------------------------------------
[{"ename":"SCOTT","sal":3000},{"ename":"KING","sal":5000},{"ename":"FORD","sal":3000}]     

SQL Boolean Data Type

Where can the new Boolean data type be used in the select statement? In conversion functions, for example.

10a) Query using Boolean data type
column dump_yes_value format a20
select cast('yes' as boolean) as yes_value,
       xmlcast(xmltype('<x>no</x>') as boolean) as no_value,
       validate_conversion('maybe' as boolean) as is_maybe_boolean,
       dump(cast('yes' as boolean)) as dump_yes_value;
YES_VALUE   NO_VALUE    IS_MAYBE_BOOLEAN DUMP_YES_VALUE
----------- ----------- ---------------- --------------------
TRUE        FALSE                      0 Typ=252 Len=1: 1

Boolean Expressions

The impact of Boolean expressions is huge. A condition becomes an expression that returns a Boolean expression. Consequently, conditions can be used wherever expressions are permitted.

11a) Query using Boolean expressions
with
   function f(p in boolean) return boolean is
   begin
      return p;
   end;
select (select count(*) from emp) = 14 and (select count(*) from dept) = 4 as is_complete,
       f(1>0) is true as is_true,
       cast(null as boolean) is not null as is_not_null;
/
IS_COMPLETE IS_TRUE     IS_NOT_NULL
----------- ----------- -----------
TRUE        TRUE        FALSE

JSON Schema

There is an extended is_JSON_condition that makes it possible to validate a JSON document against a JSON schema.

12a) Query using JSON schema
column j format a20
with
   t (j) as (values
      (json('["a", "b"]')),            -- JSON array
      (json('{"a": "a", "b": "b"}')),  -- JSON object without id property
      (json('{"id": 42}')),            -- JSON object with numeric id property
      (json('{"id": "42"}'))           -- JSON object with string id property
   )
select j,
       j is json validate '
          {
             "type": "object",
             "properties": {
                "id": { "type": "number" }
             }
          }' as is_valid
  from t;
J                    IS_VALID
-------------------- -----------
["a","b"]            FALSE
{"a":"a","b":"b"}    TRUE
{"id":42}            TRUE
{"id":"42"}          FALSE

What Else?

There are more new features in the Oracle Database 23c that you can use in the select statement, such as:

We can also assume that more features will be added with future release updates. The AI vector search, for example, should be available with 23.4 later this year.

Outlook

The plan for IslandSQL is still the same as outlined in the previous episode. So we should cover the remaining DML statements (calldeleteexplain planinsertmerge and update) in the next episode.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK