5

The most productive code I’ll write this year

 2 years ago
source link: https://mikesmithers.wordpress.com/2022/01/31/the-most-productive-code-ill-write-this-year/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

The most productive code I’ll write this year

Posted on January 31, 2022

I’m currently sitting here wearing a t-shirt so geeky that the slogan emblazoned upon it requires knowledge of both binary and unicode to understand that deciphering it is probably not worth the effort.
It is therefore not surprising that I have succumbed to the geek catnip that is Wordle.
What follows is neither a Wordle clone nor a Wordle solver. There are those far geekier cleverer than I who have already shared such creations.

Rather it is :

  • an exploration of the powers of SQL pattern matching
  • some fun and games with Ref Cursors.

Whilst I’ll be sticking firmly to the command line, there may even be a dash of colour.
Some of what follows has been influenced by Connor McDonald’s PL/SQL Wordle clone.

The end result will be a PL/SQL package called “nurdle” – defined on wordnik as

“To gently waffle or muse on a subject which one clearly knows little about”.

Also making an appearance will be several five letter words which may or may not include :

  • color – yes, Wordle does use American spellings
  • cwtch – a Welsh word meaning (according to Deb) to let your wife warm your cold feet on you to avoid having the duvet removed
  • quack – from the Australian for the sound made by an English batsman.

But first, just in case you’ve exhibited rather more willpower than I, let’s take a quick look at…

How to play Wordle

You start with no clues whatsoever other than the word that you’re trying to guess is 5 letters long.
You have six tries to guess the answer.

For each guess you enter, you get feedback as to whether each letter is in the answer as follows :

  • Green – the letter is in the word and you’ve put it in the correct position
  • Yellow – the letter is in the word but not in the position you have it
  • Gray – the letter is not in the word.

It’s worth noting that Josh Wardle, the creator of Wordle, resides in the USA, a land where vowels are strictly rationed. Therefore, it’s quite possible that the answer your looking for on any given day is an American spelling.

Simple rules then, which provide a perfect excuse to take a closer look as SQL’s pattern matching capabilities…

A worked example

In the vast majority of cases, when I use a wildcard in SQL, it’s the good old percentage sign (%) – replace zero or more characters. However, there is a second wildcard available – the underscore (_) – replace any single character.

Let’s imagine that we have an index organized table to hold a list of five-letter words…

create table nurdle_words(
word varchar2(5),
constraint nurdle_words_pk primary key (word))
organization index
/  
insert into nurdle_words values ('alike');
insert into nurdle_words values( 'audio');
insert into nurdle_words values('attic');
insert into nurdle_words values('arise');
insert into nurdle_words values('alive');
commit;

Now imagine that the word we’re trying to guess is ‘alike‘.
Obviously, we have no clues to start with so I’ll start by guessing ‘audio‘ because it has lots of vowels in it.

The feedback we get from Wordle is :

A – Green – correct letter and position
U – Grey – not in the word
D – Grey – not in the word
I – Yellow – in the word but not the 4th letter
O – Grey – not in the word

Using this information, we can build a SQL query against our word list to narrow down the possible answers…

select word
from nurdle_words
where word like 'a____' -- starts with 'a' and is 5 letters long
and word like '%i%' -- contains an 'i'
and word not like '___i_' -- i is not the 4th letter
-- exclude the Grey letters
and word not like '%u%'
and word not like '%d%'
and word not like '%o%'
/
WORD
-----
alike
alive
arise

Whilst the query is perfectly servicable, I’m feeling brave so I’m going to tidy up a bit by introducing a regular expression…

select word
from nurdle_words
where word like 'a____'
and word like '%i%'
and word not like '___i_'
and not regexp_like(word, '(u|d|o)') -- excluded characters in a single predicate
/

Our next guess is ‘arise‘, which results in :

A – Green
R – Grey
I – Green
S – Grey
E – Green

…which allows us to refine our search query further…

select word
from nurdle_words
where word like 'a_i_e'
and not regexp_like( word, '(u|d|o|r|s)')
/
WORD
-----
alike
alive

As it’s now a fifty-fifty choice, Let’s go for ‘alike’, which means we’ve got it in three goes and can feel smug for the rest of the day.
Then again, all that typing is likely to become a bit tiresome over time, so let’s have a look at implementing this functionality in a PL/SQL package…

The Package

I read somewhere that there are 10,000 words included Wordle’s dictionary. Connor’s solution contains even more.
Being rather less dedicated, I only managed to add a few hundred words to my dictionary before I got bored.
Therefore, I’ll need a means of adding words to the dictionary.

I’ll want to keep track of the guesses I’ve made so far on the current day. I’m going to use a Global Temporary Table for this purpose as I’ll only require this data to persist for the life of the session.
In order to build my queries easily, the guess words will be split into their component letters in the table.
The resulting structure will look like this :

create global temporary table todays_guesses(
guess_no number not null,
letter varchar2(1) not null,
pos_in_word number(1) not null,
status varchar2(1) not null,
constraint todays_guesses_status_chk check (status in ('B', 'Y', 'G')))
on commit preserve rows
/   

NOTE – I’ve replace the Grey for letters not in the word with Black. This is for two reasons :

  1. I want to use a single letter for feedback and grey and green both start with “G”
  2. SQLcl background colours do not include grey (see below), but does have black, which was the colour used for incorrect letters when I first saw the Wordle result messages people posted on Twitter

I’ll need some means of re-constituting the words if I want to review my guesses so far.

Additionally, if I happen to be using SQLcl, I may want to take advantage of the ability to change the background colours of the output.

Finally, I’ll need a way of taking all of the information gleaned from my guesses so far and retrieving a list of possible words to try for my next guess.

The package header looks like this :

create or replace package nurdle as
-- Maximum length of a Word
GC_LEN constant number := 5;
type wordlist is table of nurdle_words%rowtype;
-- add to the dictionary
procedure add_word ( i_word in nurdle_words.word%type);
-- record a guess and the outcome
procedure guess( i_guess in varchar2, i_result in varchar2);
-- show guesses so far and the feedback for each (text version)   
-- usage :
-- exec nurdle.show_guesses;
procedure show_guesses;
-- In SQLcl turn on sqlformat ansiconsole to get    
-- usage :
-- set sqlformat ansiconsole
-- select * from table( nurdle.colour_guesses);
function colour_guesses return sys.ODCIVarchar2List pipelined;
-- work out the possible words in the dictionary
-- usage :
-- select * from table( nurdle.next_guess);
function next_guess return wordlist pipelined;
end nurdle;
/

There are two functions which return an array of records in a Ref Cursor, but they achieve this in different ways.

function colour_guesses return sys.ODCIVarchar2List pipelined;

I copied this idea from Connor’s code. The function uses a pre-defined Oracle type to save us the bother of creating our own.
The type definition can be found in the data dictionary :

select text
from all_source
where owner = 'SYS'
and name = 'ODCIVARCHAR2LIST'
order by line
/
TEXT                                   
----------------------------------------
TYPE ODCIVarchar2List FORCE
AS VARRAY(32767) OF VARCHAR2(4000);

Incidentally, there are other types out there which can be used in a similar fashion for other base types :

  • ODCINUMBERLIST
  • ODCIDATELIST

This approach is ideal for listing guesses which will only ever have a maximum of 6 rows and the order is pre-defined. However, if you want to interact with your refcursor output through a query predicate, then you really do need to have a column name to work with.
This is why we’ve got a type declared in the package…

type wordlist is table of nurdle_words%rowtype;

…which is used as the return type for the next_guess function…

function next_guess return wordlist pipelined;

This allows us to do things like narrowing down our search to include specific letters that we haven’t tried yet. For example, if I’ve used ‘audio’ for my initial guess, I might want to try the remaining vowel in my next guess :

select word
from table(nurdle.next_guess)
where word like '%e%' -- only words that include the letter 'e'
/

Here’s the package body ( suitably commented) :

create or replace package body nurdle as
procedure verify_wordlen( i_word in varchar2)
is
-- Private procedure to ensure that a word is the required length.
-- Used by multiple sub-programs in the package.
begin
if nvl(length(i_word), 0) != GC_LEN then
raise_application_error(-20990, 'Word is not '||GC_LEN||' characters long');
end if;
end verify_wordlen;   
procedure add_word ( i_word in nurdle_words.word%type)
is
v_word nurdle_words.word%type;
begin
verify_wordlen( i_word);
-- Canonicalize the WORD value to make searching simple
v_word := lower(i_word);
merge into nurdle_words
using dual
on (word = v_word)
when not matched then
insert( word)
values(v_word);
end add_word;
procedure guess( i_guess in varchar2, i_result in varchar2)
is
v_guess varchar2(GC_LEN);
v_result varchar2(GC_LEN);
v_guess_no todays_guesses.guess_no%type;
begin
verify_wordlen(i_guess);
verify_wordlen(i_result);
-- canonicalize word to all lowercase, the result to all uppercase, just for a change !
v_guess := lower(i_guess);
v_result := upper(i_result);
select nvl(max(guess_no), 0) + 1
into v_guess_no    
from todays_guesses;
-- split the guessed word and the result into the component letters
for i in 1..GC_LEN loop
insert into todays_guesses( guess_no, letter, pos_in_word, status)
values(v_guess_no, substr(v_guess, i, 1), i, substr(v_result, i, 1));
end loop;
end guess;
procedure show_guesses
is
v_rc sys_refcursor;
begin
open v_rc for
select guess_no,
listagg(letter)  within group (order by pos_in_word) as guess,
listagg(status) within group (order by pos_in_word) as result
from todays_guesses
group by guess_no
order by guess_no;
dbms_sql.return_result(v_rc);
end show_guesses;
function colour_guesses return sys.ODCIVarchar2List pipelined
is
v_rc sys_refcursor;
v_rtn sys.ODCIVarchar2List := sys.ODCIVarchar2List();
begin
dbms_output.put_line(q'[Don't forget to "set sqlformat ansiconsole"]');
open v_rc for
select
listagg(
case status
when 'B' then '@|bg_black '
when 'Y' then '@|bg_yellow '
when 'G' then '@|bg_green '
end
||letter
||'|@')
within group (order by pos_in_word) as guess
from todays_guesses
group by guess_no
order by guess_no;
loop
v_rtn.extend(1);
fetch v_rc into v_rtn(v_rtn.count);
exit when v_rc%notfound;
pipe row( v_rtn( v_rtn.count));
end loop;
close v_rc;
return;
end colour_guesses;   
function next_guess return wordlist pipelined
is
v_row nurdle_words%rowtype;
v_stmnt clob;
v_green varchar2(GC_LEN);
v_blank varchar2(100);
v_rc sys_refcursor;
-- Remember your injection protection !
-- We're only concatenating values from this table. It's a small table.
-- Therefore, just scan the whole thing and make sure that the values we'll be using are all
-- alphabetical only
function is_dodgy return boolean is
v_count pls_integer;
begin
select count(*)
into v_count
from todays_guesses
where length(letter) != regexp_count(letter, '[[:alpha:]]');
return v_count > 0;
end is_dodgy;   
begin
if is_dodgy then
raise_application_error(-20902, 'There are non-alphabetical characters in TODAYS_GUESSES.LETTER');
end if;   
-- Specifying the where clause here will save us jumping through hoops as we
-- build the rest of the query. The optimizer will essentially disregard
-- "where 1=1" so it shouldn't cost anything in performance terms
v_stmnt := 'select word from nurdle_words where 1=1';
-- Build a pattern containing all of the Green letters - i.e. correct letters in the right place -
-- using a pile of Common Table Expressions that interact with each other.
-- Account for the fact that the same "green" letter may appear in multiple guesses...
with greens as (
select letter, pos_in_word,
row_number() over (partition by letter order by guess_no) as recnum
from todays_guesses
where status = 'G'),
ungreens as (
select '_' as letter, rownum as pos_in_word
from dual
connect by rownum <= GC_LEN),
green_pattern as (
select nvl(g.letter, u.letter) letter, u.pos_in_word
from ungreens u
left outer join greens g
on g.pos_in_word = u.pos_in_word
and g.recnum = 1
order by u.pos_in_word)
select listagg(letter) within group( order by pos_in_word)
into v_green
from green_pattern;
v_stmnt := v_stmnt||chr(10)||q'[and word like ']'||v_green||q'[']';
-- Now add in all of the Yellows - letters that are in the word but which we've put in
-- the wrong place. First we need to include the letter in the search...
for r_like in (
select letter
from todays_guesses
where status = 'Y'
and letter not in ( select letter from todays_guesses where status = 'G'))
loop
v_stmnt := v_stmnt||chr(10)||q'[and word like '%]'||r_like.letter||q'[%']';
end loop;
-- Now exclude words that have our yellow letters in the wrong place...
for r_pos in (
select case pos_in_word
when 1 then letter||'____'
when 2 then '_'||letter||'___'
when 3 then '__'||letter||'__'
when 4 then '___'||letter||'_'
when 5 then '____'||letter
end yellow_pos
from todays_guesses
where status = 'Y'
and letter not in ( select letter from todays_guesses where status = 'G'))
loop   
v_stmnt := v_stmnt||chr(10)||q'[and word not like ']'||r_pos.yellow_pos||q'[']';
end loop;   
-- Where we've made a guess with a repeating letter and found only one in the word,
-- Wordle will return one of the letters as yellow or green and the other as blank.
-- We can use the blank to infer that the letter in question is not in that position in the word
for r_double in (
select case pos_in_word
when 1 then letter||'____'
when 2 then '_'||letter||'___'
when 3 then '__'||letter||'__'
when 4 then '___'||letter||'_'
when 5 then '____'||letter
end double_pos
from todays_guesses
where status = 'B'
and letter in ( select letter from todays_guesses where status in ('G', 'Y')))
loop
v_stmnt := v_stmnt||chr(10)||q'[and word not like ']'||r_double.double_pos||q'[']';
end loop;   
-- Exclude all the Blanks
select listagg(letter, '|') within group (order by pos_in_word)
into v_blank
from todays_guesses
where status = 'B'
-- handle guesses with double-letters where the word only contains one of the letter
and letter not in (select letter from todays_guesses where status != 'B');
-- Handle the remote possibility of getting all 5 letters on the first try
if nvl(length( v_blank), 0) > 0 then
v_stmnt := v_stmnt||chr(10)||q'[and not regexp_like( word, '(]'||v_blank||q'[)')]';
end if;   
-- Output the statement in case we need to debug
dbms_output.put_line(v_stmnt);
-- Open a refcursor for the statement and pipe the contents
open v_rc for v_stmnt;
loop
fetch v_rc into v_row;
exit when v_rc%notfound;
pipe row(v_row);
end loop;
close v_rc;
return;
end next_guess;
end nurdle;
/

Let’s run the worked example again, but this time using the package :

exec nurdle.guess('audio', 'gbbyb');
select word
from table( nurdle.next_guess)
order by 1
/

If we want to see the query the code has generated, then we can enable serveroutput :

exec nurdle.guess('arise', 'gbgbg');

Let’s take a look at the guesses so far – boring version first…

…and now the colour version…

Hopefully this package will justify the title of this post and spare me countless evenings racking my brains for five-letter words ending in ‘E’.

Loading...

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK