Rules on INSERT, UPDATE and DELETE

Differences to View Rules

Rules that are defined ON INSERT, UPDATE and DELETE are totally different from the view rules described in the previous section. First, their CREATE RULE command allows more:

Second, they don't modify the parsetree in place. Instead they create zero or many new parsetrees and can throw away the original one.

How These Rules Work

Keep the syntax

    CREATE RULE rule_name AS ON event
        TO object [WHERE rule_qualification]
        DO [INSTEAD] [action | (actions) | NOTHING];
in mind. In the following, "update rules" means rules that are defined ON INSERT, UPDATE or DELETE.

Update rules get applied by the rule system when the result relation and the commandtype of a parsetree are equal to the object and event given in the CREATE RULE command. For update rules, the rule system creates a list of parsetrees. Initially the parsetree list is empty. There can be zero (NOTHING keyword), one or multiple actions. To simplify, we look at a rule with one action. This rule can have a qualification or not and it can be INSTEAD or not.

What is a rule qualification? It is a restriction that tells when the actions of the rule should be done and when not. This qualification can only reference the NEW and/or OLD pseudo relations which are basically the relation given as object (but with a special meaning).

So we have four cases that produce the following parsetrees for a one-action rule.

Finally, if the rule is not INSTEAD, the unchanged original parsetree is added to the list. Since only qualified INSTEAD rules already add the original parsetree, we end up with a total maximum of two parsetrees for a rule with one action.

The parsetrees generated from rule actions are thrown into the rewrite system again and maybe more rules get applied resulting in more or less parsetrees. So the parsetrees in the rule actions must have either another commandtype or another resultrelation. Otherwise this recursive process will end up in a loop. There is a compiled in recursion limit of currently 10 iterations. If after 10 iterations there are still update rules to apply the rule system assumes a loop over multiple rule definitions and aborts the transaction.

The parsetrees found in the actions of the pg_rewrite system catalog are only templates. Since they can reference the rangetable entries for NEW and OLD, some substitutions have to be made before they can be used. For any reference to NEW, the targetlist of the original query is searched for a corresponding entry. If found, that entries expression is placed into the reference. Otherwise NEW means the same as OLD. Any reference to OLD is replaced by a reference to the rangetable entry which is the resultrelation.

A First Rule Step by Step

We want to trace changes to the sl_avail column in the shoelace_data relation. So we setup a log table and a rule that writes us entries every time and UPDATE is performed on shoelace_data.

    CREATE TABLE shoelace_log (
        sl_name    char(10),      -- shoelace changed
        sl_avail   integer,       -- new available value
        log_who    name,          -- who did it
        log_when   datetime       -- when
    );

    CREATE RULE log_shoelace AS ON UPDATE TO shoelace_data
        WHERE NEW.sl_avail != OLD.sl_avail
        DO INSERT INTO shoelace_log VALUES (
                                        NEW.sl_name,
                                        NEW.sl_avail,
                                        getpgusername(),
                                        'now'::text
                                    );
One interesting detail is the casting of 'now' in the rules INSERT action to type text. Without that, the parser would see at CREATE RULE time, that the target type in shoelace_log is a datetime and tries to make a constant from it - with success. So a constant datetime value would be stored in the rule action and all log entries would have the time of the CREATE RULE statement. Not exactly what we want. The casting causes that the parser constructs a datetime('now'::text) from it and this will be evaluated when the rule is executed.

Now Al does

    al_bundy=> UPDATE shoelace_data SET sl_avail = 6                       
    al_bundy->     WHERE sl_name = 'sl7';
and we look at the logtable.
    al_bundy=> SELECT * FROM shoelace_log;
    sl_name   |sl_avail|log_who|log_when                        
    ----------+--------+-------+--------------------------------
    sl7       |       6|Al     |Tue Oct 20 16:14:45 1998 MET DST
    (1 row)
That's what we expected. What happened in the background is the following. The parser created the parsetree (this time the parts of the original parsetree are highlighted because the base of operations is the rule action for update rules).
    UPDATE shoelace_data SET sl_avail = 6
      FROM shoelace_data shoelace_data
     WHERE bpchareq(shoelace_data.sl_name, 'sl7');
There is a rule 'log_shoelace' that is ON UPDATE with the rule qualification expression
    int4ne(NEW.sl_avail, OLD.sl_avail)
and one action
    INSERT INTO shoelace_log SELECT 
           *NEW*.sl_name, *NEW*.sl_avail,
           getpgusername(), datetime('now'::text)
      FROM shoelace_data *NEW*, shoelace_data *OLD*,
           shoelace_log shoelace_log;
Don't trust the output of the pg_rules system view. It specially handles the situation that there are only references to NEW and OLD in the INSERT and outputs the VALUES format of INSERT. In fact there is no difference between an INSERT ... VALUES and an INSERT ... SELECT on parsetree level. They both have rangetables, targetlists and maybe qualifications etc. The optimizer later decides, if to create an execution plan of type result, seqscan, indexscan, join or whatever for that parsetree. If there are no references to rangetable entries leftin the parsetree , it becomes a result execution plan (the INSERT ... VALUES version). The rule action above can truely result in both variants.

The rule is a qualified non-INSTEAD rule, so the rule system has to return two parsetrees. The modified rule action and the original parsetree. In the first step the rangetable of the original query is incorporated into the rules action parsetree. This results in

    INSERT INTO shoelace_log SELECT 
           *NEW*.sl_name, *NEW*.sl_avai,
           getpgusername(), datetime('now'::text)
      FROM shoelace_data shoelace_data, shoelace_data *NEW*,
           shoelace_data *OLD*, shoelace_log shoelace_log;
In step 2 the rule qualification is added to it, so the result set is restricted to rows where sl_avail changes.
    INSERT INTO shoelace_log SELECT 
           *NEW*.sl_name, *NEW*.sl_avai,
           getpgusername(), datetime('now'::text)
      FROM shoelace_data shoelace_data, shoelace_data *NEW*,
           shoelace_data *OLD*, shoelace_log shoelace_log
     WHERE int4ne(*NEW*.sl_avail, *OLD*.sl_avail);
In step 3 the original parsetrees qualification is added, restricting the resultset further to only the rows touched by the original parsetree.
    INSERT INTO shoelace_log SELECT 
           *NEW*.sl_name, *NEW*.sl_avai,
           getpgusername(), datetime('now'::text)
      FROM shoelace_data shoelace_data, shoelace_data *NEW*,
           shoelace_data *OLD*, shoelace_log shoelace_log
     WHERE int4ne(*NEW*.sl_avail, *OLD*.sl_avail)
       AND bpchareq(shoelace_data.sl_name, 'sl7');
Step 4 substitutes NEW references by the targetlist entries from the original parsetree or with the matching variable references from the result relation.
    INSERT INTO shoelace_log SELECT 
           shoelace_data.sl_name, 6,
           getpgusername(), datetime('now'::text)
      FROM shoelace_data shoelace_data, shoelace_data *NEW*,
           shoelace_data *OLD*, shoelace_log shoelace_log
     WHERE int4ne(6, *OLD*.sl_avail)
       AND bpchareq(shoelace_data.sl_name, 'sl7');
Step 5 replaces OLD references into resultrelation references.
    INSERT INTO shoelace_log SELECT 
           shoelace_data.sl_name, 6,
           getpgusername(), datetime('now'::text)
      FROM shoelace_data shoelace_data, shoelace_data *NEW*,
           shoelace_data *OLD*, shoelace_log shoelace_log
     WHERE int4ne(6, shoelace_data.sl_avail)
       AND bpchareq(shoelace_data.sl_name, 'sl7');
That's it. So reduced to the max the return from the rule system is a list of two parsetrees that are the same as the statements:
    INSERT INTO shoelace_log SELECT
           shoelace_data.sl_name, 6,
           getpgusername(), 'now'
      FROM shoelace_data
     WHERE 6 != shoelace_data.sl_avail
       AND shoelace_data.sl_name = 'sl7';

    UPDATE shoelace_data SET sl_avail = 6
     WHERE sl_name = 'sl7';
These are executed in this order and that is exactly what the rule defines. The subtitutions and the qualifications added ensure, that if the original query would be an
    UPDATE shoelace_data SET sl_color = 'green'
     WHERE sl_name = 'sl7';
No log entry would get written because due to the fact that this time the original parsetree does not contain a targetlist entry for sl_avail, NEW.sl_avail will get replaced by shoelace_data.sl_avail resulting in the extra query
    INSERT INTO shoelace_log SELECT
           shoelace_data.sl_name, shoelace_data.sl_avail,
           getpgusername(), 'now'
      FROM shoelace_data
     WHERE shoelace_data.sl_avail != shoelace_data.sl_avail
       AND shoelace_data.sl_name = 'sl7';
and that qualification will never be true. Since the is no difference on parsetree level between an INSERT ... SELECT, and an INSERT ... VALUES, it will also work if the original query modifies multiple rows. So if Al would issue the command
    UPDATE shoelace_data SET sl_avail = 0
     WHERE sl_color = 'black';
four rows in fact get updated (sl1, sl2, sl3 and sl4). But sl3 already has sl_avail = 0. This time, the original parsetrees qualification is different and that results in the extra parsetree
    INSERT INTO shoelace_log SELECT
           shoelace_data.sl_name, 0,
           getpgusername(), 'now'
      FROM shoelace_data
     WHERE 0 != shoelace_data.sl_avail
       AND shoelace_data.sl_color = 'black';
This parsetree will surely insert three new log entries. And that's absolutely correct.

It is important, that the original parsetree is executed last. The Postgres "traffic cop" does a command counter increment between the execution of the two parsetrees so the second one can see changes made by the first. If the UPDATE would have been executed first, all the rows are already set to zero, so the logging INSERT would not find any row where 0 != shoelace_data.sl_avail.

Cooperation with Views

A simple way to protect view relations from the mentioned possibility that someone can INSERT, UPDATE and DELETE invisible data on them is to let those parsetrees get thrown away. We create the rules

    CREATE RULE shoe_ins_protect AS ON INSERT TO shoe
        DO INSTEAD NOTHING;
    CREATE RULE shoe_upd_protect AS ON UPDATE TO shoe
        DO INSTEAD NOTHING;
    CREATE RULE shoe_del_protect AS ON DELETE TO shoe
        DO INSTEAD NOTHING;
If Al now tries to do any of these operations on the view relation shoe, the rule system will apply the rules. Since the rules have no actions and are INSTEAD, the resulting list of parsetrees will be empty and the whole query will become nothing because there is nothing left to be optimized or executed after the rule system is done with it.

Note: This fact might irritate frontend applications because absolutely nothing happened on the database and thus, the backend will not return anything for the query. Not even a PGRES_EMPTY_QUERY or so will be available in libpq. In psql, nothing happens. This might change in the future.

A more sophisticated way to use the rule system is to create rules that rewrite the parsetree into one that does the right operation on the real tables. To do that on the shoelace view, we create the following rules:

    CREATE RULE shoelace_ins AS ON INSERT TO shoelace
        DO INSTEAD
        INSERT INTO shoelace_data VALUES (
               NEW.sl_name,
               NEW.sl_avail,
               NEW.sl_color,
               NEW.sl_len,
               NEW.sl_unit);

    CREATE RULE shoelace_upd AS ON UPDATE TO shoelace
        DO INSTEAD
        UPDATE shoelace_data SET
               sl_name = NEW.sl_name,
               sl_avail = NEW.sl_avail,
               sl_color = NEW.sl_color,
               sl_len = NEW.sl_len,
               sl_unit = NEW.sl_unit
         WHERE sl_name = OLD.sl_name;

    CREATE RULE shoelace_del AS ON DELETE TO shoelace
        DO INSTEAD
        DELETE FROM shoelace_data
         WHERE sl_name = OLD.sl_name;
Now there is a pack of shoelaces arriving in Al's shop and it has a big partlist. Al is not that good in calculating and so we don't want him to manually update the shoelace view. Instead we setup two little tables, one where he can insert the items from the partlist and one with a special trick. The create commands for anything are:
    CREATE TABLE shoelace_arrive (
        arr_name    char(10),
        arr_quant   integer
    );

    CREATE TABLE shoelace_ok (
        ok_name     char(10),
        ok_quant    integer
    );

    CREATE RULE shoelace_ok_ins AS ON INSERT TO shoelace_ok
        DO INSTEAD
        UPDATE shoelace SET
               sl_avail = sl_avail + NEW.ok_quant
         WHERE sl_name = NEW.ok_name;
Now Al can sit down and do whatever until
    al_bundy=> SELECT * FROM shoelace_arrive;
    arr_name  |arr_quant
    ----------+---------
    sl3       |       10
    sl6       |       20
    sl8       |       20
    (3 rows)
is exactly that what's on the part list. We take a quick look at the current data,
    al_bundy=> SELECT * FROM shoelace ORDER BY sl_name;
    sl_name   |sl_avail|sl_color  |sl_len|sl_unit |sl_len_cm
    ----------+--------+----------+------+--------+---------
    sl1       |       5|black     |    80|cm      |       80
    sl2       |       6|black     |   100|cm      |      100
    sl7       |       6|brown     |    60|cm      |       60
    sl3       |       0|black     |    35|inch    |     88.9
    sl4       |       8|black     |    40|inch    |    101.6
    sl8       |       1|brown     |    40|inch    |    101.6
    sl5       |       4|brown     |     1|m       |      100
    sl6       |       0|brown     |   0.9|m       |       90
    (8 rows)
move the arrived shoelaces in
    al_bundy=> INSERT INTO shoelace_ok SELECT * FROM shoelace_arrive;
and check the results
    al_bundy=> SELECT * FROM shoelace ORDER BY sl_name;
    sl_name   |sl_avail|sl_color  |sl_len|sl_unit |sl_len_cm
    ----------+--------+----------+------+--------+---------
    sl1       |       5|black     |    80|cm      |       80
    sl2       |       6|black     |   100|cm      |      100
    sl7       |       6|brown     |    60|cm      |       60
    sl4       |       8|black     |    40|inch    |    101.6
    sl3       |      10|black     |    35|inch    |     88.9
    sl8       |      21|brown     |    40|inch    |    101.6
    sl5       |       4|brown     |     1|m       |      100
    sl6       |      20|brown     |   0.9|m       |       90
    (8 rows)

    al_bundy=> SELECT * FROM shoelace_log;
    sl_name   |sl_avail|log_who|log_when                        
    ----------+--------+-------+--------------------------------
    sl7       |       6|Al     |Tue Oct 20 19:14:45 1998 MET DST
    sl3       |      10|Al     |Tue Oct 20 19:25:16 1998 MET DST
    sl6       |      20|Al     |Tue Oct 20 19:25:16 1998 MET DST
    sl8       |      21|Al     |Tue Oct 20 19:25:16 1998 MET DST
    (4 rows)
It's a long way from the one INSERT ... SELECT to these results. And it's description will be the last in this document (but not the last example :-). First there was the parsers output
    INSERT INTO shoelace_ok SELECT
           shoelace_arrive.arr_name, shoelace_arrive.arr_quant
      FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok;
Now the first rule 'shoelace_ok_ins' is applied and turns it into
    UPDATE shoelace SET
           sl_avail = int4pl(shoelace.sl_avail, shoelace_arrive.arr_quant)
      FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok,
           shoelace_ok *OLD*, shoelace_ok *NEW*,
           shoelace shoelace
     WHERE bpchareq(shoelace.sl_name, showlace_arrive.arr_name);
and throws away the original INSERT on shoelace_ok. This rewritten query is passed to the rule system again and the second applied rule 'shoelace_upd' produced
    UPDATE shoelace_data SET
           sl_name = shoelace.sl_name,
           sl_avail = int4pl(shoelace.sl_avail, shoelace_arrive.arr_quant),
           sl_color = shoelace.sl_color,
           sl_len = shoelace.sl_len,
           sl_unit = shoelace.sl_unit
      FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok,
           shoelace_ok *OLD*, shoelace_ok *NEW*,
           shoelace shoelace, shoelace *OLD*,
           shoelace *NEW*, shoelace_data showlace_data
     WHERE bpchareq(shoelace.sl_name, showlace_arrive.arr_name)
       AND bpchareq(shoelace_data.sl_name, shoelace.sl_name);
Again it's an INSTEAD rule and the previous parsetree is trashed. Note that this query sill uses the view shoelace But the rule system isn't finished with this loop so it continues and applies the rule '_RETshoelace' on it and we get
    UPDATE shoelace_data SET
           sl_name = s.sl_name,
           sl_avail = int4pl(s.sl_avail, shoelace_arrive.arr_quant),
           sl_color = s.sl_color,
           sl_len = s.sl_len,
           sl_unit = s.sl_unit
      FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok,
           shoelace_ok *OLD*, shoelace_ok *NEW*,
           shoelace shoelace, shoelace *OLD*,
           shoelace *NEW*, shoelace_data showlace_data,
           shoelace *OLD*, shoelace *NEW*,
           shoelace_data s, unit u
     WHERE bpchareq(s.sl_name, showlace_arrive.arr_name)
       AND bpchareq(shoelace_data.sl_name, s.sl_name);
Again an update rule has been applied and so the wheel turns on and we are in rewrite round 3. This time rule 'log_shoelace' gets applied what produces the extra parsetree
    INSERT INTO shoelace_log SELECT
           s.sl_name,
           int4pl(s.sl_avail, shoelace_arrive.arr_quant),
           getpgusername(),
           datetime('now'::text)
      FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok,
           shoelace_ok *OLD*, shoelace_ok *NEW*,
           shoelace shoelace, shoelace *OLD*,
           shoelace *NEW*, shoelace_data showlace_data,
           shoelace *OLD*, shoelace *NEW*,
           shoelace_data s, unit u,
           shoelace_data *OLD*, shoelace_data *NEW*
           shoelace_log shoelace_log
     WHERE bpchareq(s.sl_name,  showlace_arrive.arr_name)
       AND bpchareq(shoelace_data.sl_name, s.sl_name);
       AND int4ne(int4pl(s.sl_avail, shoelace_arrive.arr_quant),
                                                    s.sl_avail);
After that the rule system runs out of rules and returns the generated parsetrees. So we end up with two final parsetrees that are equal to the SQL statements
    INSERT INTO shoelace_log SELECT
           s.sl_name,
           s.sl_avail + shoelace_arrive.arr_quant,
           getpgusername(),
           'now'
      FROM shoelace_arrive shoelace_arrive, shoelace_data shoelace_data,
           shoelace_data s
     WHERE s.sl_name = shoelace_arrive.arr_name
       AND shoelace_data.sl_name = s.sl_name
       AND s.sl_avail + shoelace_arrive.arr_quant != s.sl_avail;
           
    UPDATE shoelace_data SET
           sl_avail = shoelace_data.sl_avail + shoelace_arrive.arr_quant
     FROM shoelace_arrive shoelace_arrive,
          shoelace_data shoelace_data,
          shoelace_data s
    WHERE s.sl_name = shoelace_arrive.sl_name
      AND shoelace_data.sl_name = s.sl_name;
The result is that data coming from one relation inserted into another, changed into updates on a third, changed into updating a fourth plus logging that final update in a fifth gets reduced into two queries.

There is a little detail that's a bit ugly. Looking at the two queries turns out, that the shoelace_data relation appears twice in the rangetable where it could definitely be reduced to one. The optimizer does not handle it and so the execution plan for the rule systems output of the INSERT will be

Nested Loop
  ->  Merge Join
        ->  Seq Scan
              ->  Sort
                    ->  Seq Scan on s
        ->  Seq Scan
              ->  Sort
                    ->  Seq Scan on shoelace_arrive
  ->  Seq Scan on shoelace_data
while omitting the extra rangetable entry would result in a
Merge Join
  ->  Seq Scan
        ->  Sort
              ->  Seq Scan on s
  ->  Seq Scan
        ->  Sort
              ->  Seq Scan on shoelace_arrive
that totally produces the same entries in the log relation. Thus, the rule system caused one extra scan on the shoelace_data relation that is absolutely not necessary. And the same obsolete scan is done once more in the UPDATE. But it was a really hard job to make that all possible at all.

A final demonstration of the Postgres rule system and it's power. There is a cute blonde that sells shoelaces. And what Al could never realize, she's not only cute, she's smart too - a little too smart. Thus, it happens from time to time that Al orders shoelaces that are absolutely not sellable. This time he ordered 1000 pairs of magenta shoelaces and since another kind is currently not available but he committed to buy some, he also prepared his database for pink ones.

    al_bundy=> INSERT INTO shoelace VALUES 
    al_bundy->     ('sl9', 0, 'pink', 35.0, 'inch', 0.0);
    al_bundy=> INSERT INTO shoelace VALUES 
    al_bundy->     ('sl10', 1000, 'magenta', 40.0, 'inch', 0.0);
Since this happens often, we must lookup for shoelace entries, that fit for absolutely no shoe sometimes. We could do that in a complicated statement every time, or we can setup a view for it. The view for this is
    CREATE VIEW shoelace_obsolete AS
        SELECT * FROM shoelace WHERE NOT EXISTS
            (SELECT shoename FROM shoe WHERE slcolor = sl_color);
It's output is
    al_bundy=> SELECT * FROM shoelace_obsolete;
    sl_name   |sl_avail|sl_color  |sl_len|sl_unit |sl_len_cm
    ----------+--------+----------+------+--------+---------
    sl9       |       0|pink      |    35|inch    |     88.9
    sl10      |    1000|magenta   |    40|inch    |    101.6
For the 1000 magenta shoelaces we must debt Al before we can throw 'em away, but that's another problem. The pink entry we delete. To make it a little harder for Postgres, we don't delete it directly. Instead we create one more view
    CREATE VIEW shoelace_candelete AS
        SELECT * FROM shoelace_obsolete WHERE sl_avail = 0;
and do it this way:
    DELETE FROM shoelace WHERE EXISTS
        (SELECT * FROM shoelace_candelete
                 WHERE sl_name = shoelace.sl_name);
Voila:
    al_bundy=> SELECT * FROM shoelace;
    sl_name   |sl_avail|sl_color  |sl_len|sl_unit |sl_len_cm
    ----------+--------+----------+------+--------+---------
    sl1       |       5|black     |    80|cm      |       80
    sl2       |       6|black     |   100|cm      |      100
    sl7       |       6|brown     |    60|cm      |       60
    sl4       |       8|black     |    40|inch    |    101.6
    sl3       |      10|black     |    35|inch    |     88.9
    sl8       |      21|brown     |    40|inch    |    101.6
    sl10      |    1000|magenta   |    40|inch    |    101.6
    sl5       |       4|brown     |     1|m       |      100
    sl6       |      20|brown     |   0.9|m       |       90
    (9 rows)
A DELETE on a view, with a subselect qualification that in total uses 4 nesting/joined views, where one of them itself has a subselect qualification containing a view and where calculated view columns are used, gets rewritten into one single parsetree that deletes the requested data from a real table.

I think there are only a few situations out in the real world, where such a construct is necessary. But it makes me feel comfortable that it works.

The truth is: Doing this I found one more bug while writing this document. But after fixing that I was a little amazed that it works at all.