| <!-- doc/src/sgml/rules.sgml --> |
| |
| <chapter id="rules"> |
| <title>The Rule System</title> |
| |
| <indexterm zone="rules"> |
| <primary>rule</primary> |
| </indexterm> |
| |
| <para> |
| This chapter discusses the rule system in |
| <productname>PostgreSQL</productname>. Production rule systems |
| are conceptually simple, but there are many subtle points |
| involved in actually using them. |
| </para> |
| |
| <para> |
| Some other database systems define active database rules, which |
| are usually stored procedures and triggers. In |
| <productname>PostgreSQL</productname>, these can be implemented |
| using functions and triggers as well. |
| </para> |
| |
| <para> |
| The rule system (more precisely speaking, the query rewrite rule |
| system) is totally different from stored procedures and triggers. |
| It modifies queries to take rules into consideration, and then |
| passes the modified query to the query planner for planning and |
| execution. It is very powerful, and can be used for many things |
| such as query language procedures, views, and versions. The |
| theoretical foundations and the power of this rule system are |
| also discussed in <xref linkend="ston90b"/> and <xref |
| linkend="ong90"/>. |
| </para> |
| |
| <sect1 id="querytree"> |
| <title>The Query Tree</title> |
| |
| <indexterm zone="querytree"> |
| <primary>query tree</primary> |
| </indexterm> |
| |
| <para> |
| To understand how the rule system works it is necessary to know |
| when it is invoked and what its input and results are. |
| </para> |
| |
| <para> |
| The rule system is located between the parser and the planner. |
| It takes the output of the parser, one query tree, and the user-defined |
| rewrite rules, which are also |
| query trees with some extra information, and creates zero or more |
| query trees as result. So its input and output are always things |
| the parser itself could have produced and thus, anything it sees |
| is basically representable as an <acronym>SQL</acronym> statement. |
| </para> |
| |
| <para> |
| Now what is a query tree? It is an internal representation of an |
| <acronym>SQL</acronym> statement where the single parts that it is |
| built from are stored separately. These query trees can be shown |
| in the server log if you set the configuration parameters |
| <varname>debug_print_parse</varname>, |
| <varname>debug_print_rewritten</varname>, or |
| <varname>debug_print_plan</varname>. The rule actions are also |
| stored as query trees, in the system catalog |
| <structname>pg_rewrite</structname>. They are not formatted like |
| the log output, but they contain exactly the same information. |
| </para> |
| |
| <para> |
| Reading a raw query tree requires some experience. But since |
| <acronym>SQL</acronym> representations of query trees are |
| sufficient to understand the rule system, this chapter will not |
| teach how to read them. |
| </para> |
| |
| <para> |
| When reading the <acronym>SQL</acronym> representations of the |
| query trees in this chapter it is necessary to be able to identify |
| the parts the statement is broken into when it is in the query tree |
| structure. The parts of a query tree are |
| |
| <variablelist> |
| <varlistentry> |
| <term> |
| the command type |
| </term> |
| <listitem> |
| <para> |
| This is a simple value telling which command |
| (<command>SELECT</command>, <command>INSERT</command>, |
| <command>UPDATE</command>, <command>DELETE</command>) produced |
| the query tree. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term> |
| the range table |
| <indexterm><primary>range table</primary></indexterm> |
| </term> |
| <listitem> |
| <para> |
| The range table is a list of relations that are used in the query. |
| In a <command>SELECT</command> statement these are the relations given after |
| the <literal>FROM</literal> key word. |
| </para> |
| |
| <para> |
| Every range table entry identifies a table or view and tells |
| by which name it is called in the other parts of the query. |
| In the query tree, the range table entries are referenced by |
| number rather than by name, so here it doesn't matter if there |
| are duplicate names as it would in an <acronym>SQL</acronym> |
| statement. This can happen after the range tables of rules |
| have been merged in. The examples in this chapter will not have |
| this situation. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term> |
| the result relation |
| </term> |
| <listitem> |
| <para> |
| This is an index into the range table that identifies the |
| relation where the results of the query go. |
| </para> |
| |
| <para> |
| <command>SELECT</command> queries don't have a result |
| relation. (The special case of <command>SELECT INTO</command> is |
| mostly identical to <command>CREATE TABLE</command> followed by |
| <literal>INSERT ... SELECT</literal>, and is not discussed |
| separately here.) |
| </para> |
| |
| <para> |
| For <command>INSERT</command>, <command>UPDATE</command>, and |
| <command>DELETE</command> commands, the result relation is the table |
| (or view!) where the changes are to take effect. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term> |
| the target list |
| <indexterm><primary>target list</primary></indexterm> |
| </term> |
| <listitem> |
| <para> |
| The target list is a list of expressions that define the |
| result of the query. In the case of a |
| <command>SELECT</command>, these expressions are the ones that |
| build the final output of the query. They correspond to the |
| expressions between the key words <command>SELECT</command> |
| and <command>FROM</command>. (<literal>*</literal> is just an |
| abbreviation for all the column names of a relation. It is |
| expanded by the parser into the individual columns, so the |
| rule system never sees it.) |
| </para> |
| |
| <para> |
| <command>DELETE</command> commands don't need a normal target list |
| because they don't produce any result. Instead, the planner |
| adds a special <acronym>CTID</acronym> entry to the empty target list, |
| to allow the executor to find the row to be deleted. |
| (<acronym>CTID</acronym> is added when the result relation is an ordinary |
| table. If it is a view, a whole-row variable is added instead, by |
| the rule system, as described in <xref linkend="rules-views-update"/>.) |
| </para> |
| |
| <para> |
| For <command>INSERT</command> commands, the target list describes |
| the new rows that should go into the result relation. It consists of the |
| expressions in the <literal>VALUES</literal> clause or the ones from the |
| <command>SELECT</command> clause in <literal>INSERT |
| ... SELECT</literal>. The first step of the rewrite process adds |
| target list entries for any columns that were not assigned to by |
| the original command but have defaults. Any remaining columns (with |
| neither a given value nor a default) will be filled in by the |
| planner with a constant null expression. |
| </para> |
| |
| <para> |
| For <command>UPDATE</command> commands, the target list |
| describes the new rows that should replace the old ones. In the |
| rule system, it contains just the expressions from the <literal>SET |
| column = expression</literal> part of the command. The planner will |
| handle missing columns by inserting expressions that copy the values |
| from the old row into the new one. Just as for <command>DELETE</command>, |
| a <acronym>CTID</acronym> or whole-row variable is added so that |
| the executor can identify the old row to be updated. |
| </para> |
| |
| <para> |
| Every entry in the target list contains an expression that can |
| be a constant value, a variable pointing to a column of one |
| of the relations in the range table, a parameter, or an expression |
| tree made of function calls, constants, variables, operators, etc. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term> |
| the qualification |
| </term> |
| <listitem> |
| <para> |
| The query's qualification is an expression much like one of |
| those contained in the target list entries. The result value of |
| this expression is a Boolean that tells whether the operation |
| (<command>INSERT</command>, <command>UPDATE</command>, |
| <command>DELETE</command>, or <command>SELECT</command>) for the |
| final result row should be executed or not. It corresponds to the <literal>WHERE</literal> clause |
| of an <acronym>SQL</acronym> statement. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term> |
| the join tree |
| </term> |
| <listitem> |
| <para> |
| The query's join tree shows the structure of the <literal>FROM</literal> clause. |
| For a simple query like <literal>SELECT ... FROM a, b, c</literal>, the join tree is just |
| a list of the <literal>FROM</literal> items, because we are allowed to join them in |
| any order. But when <literal>JOIN</literal> expressions, particularly outer joins, |
| are used, we have to join in the order shown by the joins. |
| In that case, the join tree shows the structure of the <literal>JOIN</literal> expressions. The |
| restrictions associated with particular <literal>JOIN</literal> clauses (from <literal>ON</literal> or |
| <literal>USING</literal> expressions) are stored as qualification expressions attached |
| to those join-tree nodes. It turns out to be convenient to store |
| the top-level <literal>WHERE</literal> expression as a qualification attached to the |
| top-level join-tree item, too. So really the join tree represents |
| both the <literal>FROM</literal> and <literal>WHERE</literal> clauses of a <command>SELECT</command>. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term> |
| the others |
| </term> |
| <listitem> |
| <para> |
| The other parts of the query tree like the <literal>ORDER BY</literal> |
| clause aren't of interest here. The rule system |
| substitutes some entries there while applying rules, but that |
| doesn't have much to do with the fundamentals of the rule |
| system. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| </variablelist> |
| </para> |
| </sect1> |
| |
| <sect1 id="rules-views"> |
| <title>Views and the Rule System</title> |
| |
| <indexterm zone="rules-views"> |
| <primary>rule</primary> |
| <secondary>and views</secondary> |
| </indexterm> |
| |
| <indexterm zone="rules-views"> |
| <primary>view</primary> |
| <secondary>implementation through rules</secondary> |
| </indexterm> |
| |
| <para> |
| Views in <productname>PostgreSQL</productname> are implemented |
| using the rule system. In fact, there is essentially no difference |
| between: |
| |
| <programlisting> |
| CREATE VIEW myview AS SELECT * FROM mytab; |
| </programlisting> |
| |
| compared against the two commands: |
| |
| <programlisting> |
| CREATE TABLE myview (<replaceable>same column list as mytab</replaceable>); |
| CREATE RULE "_RETURN" AS ON SELECT TO myview DO INSTEAD |
| SELECT * FROM mytab; |
| </programlisting> |
| |
| because this is exactly what the <command>CREATE VIEW</command> |
| command does internally. This has some side effects. One of them |
| is that the information about a view in the |
| <productname>PostgreSQL</productname> system catalogs is exactly |
| the same as it is for a table. So for the parser, there is |
| absolutely no difference between a table and a view. They are the |
| same thing: relations. |
| </para> |
| |
| <sect2 id="rules-select"> |
| <title>How <command>SELECT</command> Rules Work</title> |
| |
| <indexterm zone="rules-select"> |
| <primary>rule</primary> |
| <secondary sortas="SELECT">for SELECT</secondary> |
| </indexterm> |
| |
| <para> |
| Rules <literal>ON SELECT</literal> are applied to all queries as the last step, even |
| if the command given is an <command>INSERT</command>, |
| <command>UPDATE</command> or <command>DELETE</command>. And they |
| have different semantics from rules on the other command types in that they modify the |
| query tree in place instead of creating a new one. So |
| <command>SELECT</command> rules are described first. |
| </para> |
| |
| <para> |
| Currently, there can be only one action in an <literal>ON SELECT</literal> rule, and it must |
| be an unconditional <command>SELECT</command> action that is <literal>INSTEAD</literal>. This restriction was |
| required to make rules safe enough to open them for ordinary users, and |
| it restricts <literal>ON SELECT</literal> rules to act like views. |
| </para> |
| |
| <para> |
| The examples for this chapter are two join views that do some |
| calculations and some more views using them in turn. One of the |
| two first views is customized later by adding rules for |
| <command>INSERT</command>, <command>UPDATE</command>, and |
| <command>DELETE</command> operations so that the final result will |
| be a view that behaves like a real table with some magic |
| functionality. This is not such a simple example to start from and |
| this makes things harder to get into. But it's better to have one |
| example that covers all the points discussed step by step rather |
| than having many different ones that might mix up in mind. |
| </para> |
| |
| <para> |
| The real tables we need in the first two rule system descriptions |
| are these: |
| |
| <programlisting> |
| CREATE TABLE shoe_data ( |
| shoename text, -- primary key |
| sh_avail integer, -- available number of pairs |
| slcolor text, -- preferred shoelace color |
| slminlen real, -- minimum shoelace length |
| slmaxlen real, -- maximum shoelace length |
| slunit text -- length unit |
| ); |
| |
| CREATE TABLE shoelace_data ( |
| sl_name text, -- primary key |
| sl_avail integer, -- available number of pairs |
| sl_color text, -- shoelace color |
| sl_len real, -- shoelace length |
| sl_unit text -- length unit |
| ); |
| |
| CREATE TABLE unit ( |
| un_name text, -- primary key |
| un_fact real -- factor to transform to cm |
| ); |
| </programlisting> |
| |
| As you can see, they represent shoe-store data. |
| </para> |
| |
| <para> |
| The views are created as: |
| |
| <programlisting> |
| CREATE VIEW shoe AS |
| SELECT sh.shoename, |
| sh.sh_avail, |
| sh.slcolor, |
| sh.slminlen, |
| sh.slminlen * un.un_fact AS slminlen_cm, |
| sh.slmaxlen, |
| sh.slmaxlen * un.un_fact AS slmaxlen_cm, |
| sh.slunit |
| FROM shoe_data sh, unit un |
| WHERE sh.slunit = un.un_name; |
| |
| CREATE VIEW shoelace AS |
| SELECT s.sl_name, |
| s.sl_avail, |
| s.sl_color, |
| s.sl_len, |
| s.sl_unit, |
| s.sl_len * u.un_fact AS sl_len_cm |
| FROM shoelace_data s, unit u |
| WHERE s.sl_unit = u.un_name; |
| |
| CREATE VIEW shoe_ready AS |
| SELECT rsh.shoename, |
| rsh.sh_avail, |
| rsl.sl_name, |
| rsl.sl_avail, |
| least(rsh.sh_avail, rsl.sl_avail) AS total_avail |
| FROM shoe rsh, shoelace rsl |
| WHERE rsl.sl_color = rsh.slcolor |
| AND rsl.sl_len_cm >= rsh.slminlen_cm |
| AND rsl.sl_len_cm <= rsh.slmaxlen_cm; |
| </programlisting> |
| |
| The <command>CREATE VIEW</command> command for the |
| <literal>shoelace</literal> view (which is the simplest one we |
| have) will create a relation <literal>shoelace</literal> and an entry in |
| <structname>pg_rewrite</structname> that tells that there is a |
| rewrite rule that must be applied whenever the relation <literal>shoelace</literal> |
| is referenced in a query's range table. The rule has no rule |
| qualification (discussed later, with the non-<command>SELECT</command> rules, since |
| <command>SELECT</command> rules currently cannot have them) and it is <literal>INSTEAD</literal>. Note |
| that rule qualifications are not the same as query qualifications. |
| The action of our rule has a query qualification. |
| The action of the rule is one query tree that is a copy of the |
| <command>SELECT</command> statement in the view creation command. |
| </para> |
| |
| <note> |
| <para> |
| The two extra range |
| table entries for <literal>NEW</literal> and <literal>OLD</literal> that you can see in |
| the <structname>pg_rewrite</structname> entry aren't of interest |
| for <command>SELECT</command> rules. |
| </para> |
| </note> |
| |
| <para> |
| Now we populate <literal>unit</literal>, <literal>shoe_data</literal> |
| and <literal>shoelace_data</literal> and run a simple query on a view: |
| |
| <programlisting> |
| INSERT INTO unit VALUES ('cm', 1.0); |
| INSERT INTO unit VALUES ('m', 100.0); |
| INSERT INTO unit VALUES ('inch', 2.54); |
| |
| INSERT INTO shoe_data VALUES ('sh1', 2, 'black', 70.0, 90.0, 'cm'); |
| INSERT INTO shoe_data VALUES ('sh2', 0, 'black', 30.0, 40.0, 'inch'); |
| INSERT INTO shoe_data VALUES ('sh3', 4, 'brown', 50.0, 65.0, 'cm'); |
| INSERT INTO shoe_data VALUES ('sh4', 3, 'brown', 40.0, 50.0, 'inch'); |
| |
| INSERT INTO shoelace_data VALUES ('sl1', 5, 'black', 80.0, 'cm'); |
| INSERT INTO shoelace_data VALUES ('sl2', 6, 'black', 100.0, 'cm'); |
| INSERT INTO shoelace_data VALUES ('sl3', 0, 'black', 35.0 , 'inch'); |
| INSERT INTO shoelace_data VALUES ('sl4', 8, 'black', 40.0 , 'inch'); |
| INSERT INTO shoelace_data VALUES ('sl5', 4, 'brown', 1.0 , 'm'); |
| INSERT INTO shoelace_data VALUES ('sl6', 0, 'brown', 0.9 , 'm'); |
| INSERT INTO shoelace_data VALUES ('sl7', 7, 'brown', 60 , 'cm'); |
| INSERT INTO shoelace_data VALUES ('sl8', 1, 'brown', 40 , 'inch'); |
| |
| SELECT * FROM shoelace; |
| |
| sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm |
| -----------+----------+----------+--------+---------+----------- |
| sl1 | 5 | black | 80 | cm | 80 |
| sl2 | 6 | black | 100 | cm | 100 |
| sl7 | 7 | brown | 60 | cm | 60 |
| sl3 | 0 | black | 35 | inch | 88.9 |
| sl4 | 8 | black | 40 | inch | 101.6 |
| sl8 | 1 | brown | 40 | inch | 101.6 |
| sl5 | 4 | brown | 1 | m | 100 |
| sl6 | 0 | brown | 0.9 | m | 90 |
| (8 rows) |
| </programlisting> |
| </para> |
| |
| <para> |
| This is the simplest <command>SELECT</command> you can do on our |
| views, so we take this opportunity to explain the basics of view |
| rules. The <literal>SELECT * FROM shoelace</literal> was |
| interpreted by the parser and produced the query tree: |
| |
| <programlisting> |
| SELECT shoelace.sl_name, shoelace.sl_avail, |
| shoelace.sl_color, shoelace.sl_len, |
| shoelace.sl_unit, shoelace.sl_len_cm |
| FROM shoelace shoelace; |
| </programlisting> |
| |
| and this is given to the rule system. The rule system walks through the |
| range table and checks if there are rules |
| for any relation. When processing the range table entry for |
| <literal>shoelace</literal> (the only one up to now) it finds the |
| <literal>_RETURN</literal> rule with the query tree: |
| |
| <programlisting> |
| SELECT s.sl_name, s.sl_avail, |
| s.sl_color, s.sl_len, s.sl_unit, |
| s.sl_len * u.un_fact AS sl_len_cm |
| FROM shoelace old, shoelace new, |
| shoelace_data s, unit u |
| WHERE s.sl_unit = u.un_name; |
| </programlisting> |
| </para> |
| |
| <para> |
| To expand the view, the rewriter simply creates a subquery range-table |
| entry containing the rule's action query tree, and substitutes this |
| range table entry for the original one that referenced the view. The |
| resulting rewritten query tree is almost the same as if you had typed: |
| |
| <programlisting> |
| SELECT shoelace.sl_name, shoelace.sl_avail, |
| shoelace.sl_color, shoelace.sl_len, |
| shoelace.sl_unit, shoelace.sl_len_cm |
| FROM (SELECT s.sl_name, |
| s.sl_avail, |
| s.sl_color, |
| s.sl_len, |
| s.sl_unit, |
| s.sl_len * u.un_fact AS sl_len_cm |
| FROM shoelace_data s, unit u |
| WHERE s.sl_unit = u.un_name) shoelace; |
| </programlisting> |
| |
| There is one difference however: the subquery's range table has two |
| extra entries <literal>shoelace old</literal> and <literal>shoelace new</literal>. These entries don't |
| participate directly in the query, since they aren't referenced by |
| the subquery's join tree or target list. The rewriter uses them |
| to store the access privilege check information that was originally present |
| in the range-table entry that referenced the view. In this way, the |
| executor will still check that the user has proper privileges to access |
| the view, even though there's no direct use of the view in the rewritten |
| query. |
| </para> |
| |
| <para> |
| That was the first rule applied. The rule system will continue checking |
| the remaining range-table entries in the top query (in this example there |
| are no more), and it will recursively check the range-table entries in |
| the added subquery to see if any of them reference views. (But it |
| won't expand <literal>old</literal> or <literal>new</literal> — otherwise we'd have infinite recursion!) |
| In this example, there are no rewrite rules for <literal>shoelace_data</literal> or <literal>unit</literal>, |
| so rewriting is complete and the above is the final result given to |
| the planner. |
| </para> |
| |
| <para> |
| Now we want to write a query that finds out for which shoes currently in the store |
| we have the matching shoelaces (color and length) and where the |
| total number of exactly matching pairs is greater than or equal to two. |
| |
| <programlisting> |
| SELECT * FROM shoe_ready WHERE total_avail >= 2; |
| |
| shoename | sh_avail | sl_name | sl_avail | total_avail |
| ----------+----------+---------+----------+------------- |
| sh1 | 2 | sl1 | 5 | 2 |
| sh3 | 4 | sl7 | 7 | 4 |
| (2 rows) |
| </programlisting> |
| </para> |
| |
| <para> |
| The output of the parser this time is the query tree: |
| |
| <programlisting> |
| SELECT shoe_ready.shoename, shoe_ready.sh_avail, |
| shoe_ready.sl_name, shoe_ready.sl_avail, |
| shoe_ready.total_avail |
| FROM shoe_ready shoe_ready |
| WHERE shoe_ready.total_avail >= 2; |
| </programlisting> |
| |
| The first rule applied will be the one for the |
| <literal>shoe_ready</literal> view and it results in the |
| query tree: |
| |
| <programlisting> |
| SELECT shoe_ready.shoename, shoe_ready.sh_avail, |
| shoe_ready.sl_name, shoe_ready.sl_avail, |
| shoe_ready.total_avail |
| FROM (SELECT rsh.shoename, |
| rsh.sh_avail, |
| rsl.sl_name, |
| rsl.sl_avail, |
| least(rsh.sh_avail, rsl.sl_avail) AS total_avail |
| FROM shoe rsh, shoelace rsl |
| WHERE rsl.sl_color = rsh.slcolor |
| AND rsl.sl_len_cm >= rsh.slminlen_cm |
| AND rsl.sl_len_cm <= rsh.slmaxlen_cm) shoe_ready |
| WHERE shoe_ready.total_avail >= 2; |
| </programlisting> |
| |
| Similarly, the rules for <literal>shoe</literal> and |
| <literal>shoelace</literal> are substituted into the range table of |
| the subquery, leading to a three-level final query tree: |
| |
| <programlisting> |
| SELECT shoe_ready.shoename, shoe_ready.sh_avail, |
| shoe_ready.sl_name, shoe_ready.sl_avail, |
| shoe_ready.total_avail |
| FROM (SELECT rsh.shoename, |
| rsh.sh_avail, |
| rsl.sl_name, |
| rsl.sl_avail, |
| least(rsh.sh_avail, rsl.sl_avail) AS total_avail |
| FROM (SELECT sh.shoename, |
| sh.sh_avail, |
| sh.slcolor, |
| sh.slminlen, |
| sh.slminlen * un.un_fact AS slminlen_cm, |
| sh.slmaxlen, |
| sh.slmaxlen * un.un_fact AS slmaxlen_cm, |
| sh.slunit |
| FROM shoe_data sh, unit un |
| WHERE sh.slunit = un.un_name) rsh, |
| (SELECT s.sl_name, |
| s.sl_avail, |
| s.sl_color, |
| s.sl_len, |
| s.sl_unit, |
| s.sl_len * u.un_fact AS sl_len_cm |
| FROM shoelace_data s, unit u |
| WHERE s.sl_unit = u.un_name) rsl |
| WHERE rsl.sl_color = rsh.slcolor |
| AND rsl.sl_len_cm >= rsh.slminlen_cm |
| AND rsl.sl_len_cm <= rsh.slmaxlen_cm) shoe_ready |
| WHERE shoe_ready.total_avail > 2; |
| </programlisting> |
| </para> |
| |
| <para> |
| This might look inefficient, but the planner will collapse this into a |
| single-level query tree by <quote>pulling up</quote> the subqueries, |
| and then it will plan the joins just as if we'd written them out |
| manually. So collapsing the query tree is an optimization that the |
| rewrite system doesn't have to concern itself with. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>View Rules in Non-<command>SELECT</command> Statements</title> |
| |
| <para> |
| Two details of the query tree aren't touched in the description of |
| view rules above. These are the command type and the result relation. |
| In fact, the command type is not needed by view rules, but the result |
| relation may affect the way in which the query rewriter works, because |
| special care needs to be taken if the result relation is a view. |
| </para> |
| |
| <para> |
| There are only a few differences between a query tree for a |
| <command>SELECT</command> and one for any other |
| command. Obviously, they have a different command type and for a |
| command other than a <command>SELECT</command>, the result |
| relation points to the range-table entry where the result should |
| go. Everything else is absolutely the same. So having two tables |
| <literal>t1</literal> and <literal>t2</literal> with columns <literal>a</literal> and |
| <literal>b</literal>, the query trees for the two statements: |
| |
| <programlisting> |
| SELECT t2.b FROM t1, t2 WHERE t1.a = t2.a; |
| |
| UPDATE t1 SET b = t2.b FROM t2 WHERE t1.a = t2.a; |
| </programlisting> |
| |
| are nearly identical. In particular: |
| |
| <itemizedlist> |
| <listitem> |
| <para> |
| The range tables contain entries for the tables <literal>t1</literal> and <literal>t2</literal>. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| The target lists contain one variable that points to column |
| <literal>b</literal> of the range table entry for table <literal>t2</literal>. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| The qualification expressions compare the columns <literal>a</literal> of both |
| range-table entries for equality. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| The join trees show a simple join between <literal>t1</literal> and <literal>t2</literal>. |
| </para> |
| </listitem> |
| </itemizedlist> |
| </para> |
| |
| <para> |
| The consequence is, that both query trees result in similar |
| execution plans: They are both joins over the two tables. For the |
| <command>UPDATE</command> the missing columns from <literal>t1</literal> are added to |
| the target list by the planner and the final query tree will read |
| as: |
| |
| <programlisting> |
| UPDATE t1 SET a = t1.a, b = t2.b FROM t2 WHERE t1.a = t2.a; |
| </programlisting> |
| |
| and thus the executor run over the join will produce exactly the |
| same result set as: |
| |
| <programlisting> |
| SELECT t1.a, t2.b FROM t1, t2 WHERE t1.a = t2.a; |
| </programlisting> |
| |
| But there is a little problem in |
| <command>UPDATE</command>: the part of the executor plan that does |
| the join does not care what the results from the join are |
| meant for. It just produces a result set of rows. The fact that |
| one is a <command>SELECT</command> command and the other is an |
| <command>UPDATE</command> is handled higher up in the executor, where |
| it knows that this is an <command>UPDATE</command>, and it knows that |
| this result should go into table <literal>t1</literal>. But which of the rows |
| that are there has to be replaced by the new row? |
| </para> |
| |
| <para> |
| To resolve this problem, another entry is added to the target list |
| in <command>UPDATE</command> (and also in |
| <command>DELETE</command>) statements: the current tuple ID |
| (<acronym>CTID</acronym>).<indexterm><primary>CTID</primary></indexterm> |
| This is a system column containing the |
| file block number and position in the block for the row. Knowing |
| the table, the <acronym>CTID</acronym> can be used to retrieve the |
| original row of <literal>t1</literal> to be updated. After adding the |
| <acronym>CTID</acronym> to the target list, the query actually looks like: |
| |
| <programlisting> |
| SELECT t1.a, t2.b, t1.ctid FROM t1, t2 WHERE t1.a = t2.a; |
| </programlisting> |
| |
| Now another detail of <productname>PostgreSQL</productname> enters |
| the stage. Old table rows aren't overwritten, and this |
| is why <command>ROLLBACK</command> is fast. In an <command>UPDATE</command>, |
| the new result row is inserted into the table (after stripping the |
| <acronym>CTID</acronym>) and in the row header of the old row, which the |
| <acronym>CTID</acronym> pointed to, the <literal>cmax</literal> and |
| <literal>xmax</literal> entries are set to the current command counter |
| and current transaction ID. Thus the old row is hidden, and after |
| the transaction commits the vacuum cleaner can eventually remove |
| the dead row. |
| </para> |
| |
| <para> |
| Knowing all that, we can simply apply view rules in absolutely |
| the same way to any command. There is no difference. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>The Power of Views in <productname>PostgreSQL</productname></title> |
| |
| <para> |
| The above demonstrates how the rule system incorporates view |
| definitions into the original query tree. In the second example, a |
| simple <command>SELECT</command> from one view created a final |
| query tree that is a join of 4 tables (<literal>unit</literal> was used twice with |
| different names). |
| </para> |
| |
| <para> |
| The benefit of implementing views with the rule system is |
| that the planner has all |
| the information about which tables have to be scanned plus the |
| relationships between these tables plus the restrictive |
| qualifications from the views plus the qualifications from |
| the original query |
| in one single query tree. And this is still the situation |
| when the original query is already a join over views. |
| The planner has to decide which is |
| the best path to execute the query, and the more information |
| the planner has, the better this decision can be. And |
| the rule system as implemented in <productname>PostgreSQL</productname> |
| ensures that this is all information available about the query |
| up to that point. |
| </para> |
| </sect2> |
| |
| <sect2 id="rules-views-update"> |
| <title>Updating a View</title> |
| |
| <para> |
| What happens if a view is named as the target relation for an |
| <command>INSERT</command>, <command>UPDATE</command>, or |
| <command>DELETE</command>? Doing the substitutions |
| described above would give a query tree in which the result |
| relation points at a subquery range-table entry, which will not |
| work. There are several ways in which <productname>PostgreSQL</productname> |
| can support the appearance of updating a view, however. |
| In order of user-experienced complexity those are: automatically substitute |
| in the underlying table for the view, execute a user-defined trigger, |
| or rewrite the query per a user-defined rule. |
| These options are discussed below. |
| </para> |
| |
| <para> |
| If the subquery selects from a single base relation and is simple |
| enough, the rewriter can automatically replace the subquery with the |
| underlying base relation so that the <command>INSERT</command>, |
| <command>UPDATE</command>, or <command>DELETE</command> is applied to |
| the base relation in the appropriate way. Views that are |
| <quote>simple enough</quote> for this are called <firstterm>automatically |
| updatable</firstterm>. For detailed information on the kinds of view that can |
| be automatically updated, see <xref linkend="sql-createview"/>. |
| </para> |
| |
| <para> |
| Alternatively, the operation may be handled by a user-provided |
| <literal>INSTEAD OF</literal> trigger on the view |
| (see <xref linkend="sql-createtrigger"/>). |
| Rewriting works slightly differently |
| in this case. For <command>INSERT</command>, the rewriter does |
| nothing at all with the view, leaving it as the result relation |
| for the query. For <command>UPDATE</command> and |
| <command>DELETE</command>, it's still necessary to expand the |
| view query to produce the <quote>old</quote> rows that the command will |
| attempt to update or delete. So the view is expanded as normal, |
| but another unexpanded range-table entry is added to the query |
| to represent the view in its capacity as the result relation. |
| </para> |
| |
| <para> |
| The problem that now arises is how to identify the rows to be |
| updated in the view. Recall that when the result relation |
| is a table, a special <acronym>CTID</acronym> entry is added to the target |
| list to identify the physical locations of the rows to be updated. |
| This does not work if the result relation is a view, because a view |
| does not have any <acronym>CTID</acronym>, since its rows do not have |
| actual physical locations. Instead, for an <command>UPDATE</command> |
| or <command>DELETE</command> operation, a special <literal>wholerow</literal> |
| entry is added to the target list, which expands to include all |
| columns from the view. The executor uses this value to supply the |
| <quote>old</quote> row to the <literal>INSTEAD OF</literal> trigger. It is |
| up to the trigger to work out what to update based on the old and |
| new row values. |
| </para> |
| |
| <para> |
| Another possibility is for the user to define <literal>INSTEAD</literal> |
| rules that specify substitute actions for <command>INSERT</command>, |
| <command>UPDATE</command>, and <command>DELETE</command> commands on |
| a view. These rules will rewrite the command, typically into a command |
| that updates one or more tables, rather than views. That is the topic |
| of <xref linkend="rules-update"/>. |
| </para> |
| |
| <para> |
| Note that rules are evaluated first, rewriting the original query |
| before it is planned and executed. Therefore, if a view has |
| <literal>INSTEAD OF</literal> triggers as well as rules on <command>INSERT</command>, |
| <command>UPDATE</command>, or <command>DELETE</command>, then the rules will be |
| evaluated first, and depending on the result, the triggers may not be |
| used at all. |
| </para> |
| |
| <para> |
| Automatic rewriting of an <command>INSERT</command>, |
| <command>UPDATE</command>, or <command>DELETE</command> query on a |
| simple view is always tried last. Therefore, if a view has rules or |
| triggers, they will override the default behavior of automatically |
| updatable views. |
| </para> |
| |
| <para> |
| If there are no <literal>INSTEAD</literal> rules or <literal>INSTEAD OF</literal> |
| triggers for the view, and the rewriter cannot automatically rewrite |
| the query as an update on the underlying base relation, an error will |
| be thrown because the executor cannot update a view as such. |
| </para> |
| |
| </sect2> |
| |
| </sect1> |
| |
| <sect1 id="rules-materializedviews"> |
| <title>Materialized Views</title> |
| |
| <indexterm zone="rules-materializedviews"> |
| <primary>rule</primary> |
| <secondary>and materialized views</secondary> |
| </indexterm> |
| |
| <indexterm zone="rules-materializedviews"> |
| <primary>materialized view</primary> |
| <secondary>implementation through rules</secondary> |
| </indexterm> |
| |
| <indexterm zone="rules-materializedviews"> |
| <primary>view</primary> |
| <secondary>materialized</secondary> |
| </indexterm> |
| |
| <para> |
| Materialized views in <productname>PostgreSQL</productname> use the |
| rule system like views do, but persist the results in a table-like form. |
| The main differences between: |
| |
| <programlisting> |
| CREATE MATERIALIZED VIEW mymatview AS SELECT * FROM mytab; |
| </programlisting> |
| |
| and: |
| |
| <programlisting> |
| CREATE TABLE mymatview AS SELECT * FROM mytab; |
| </programlisting> |
| |
| are that the materialized view cannot subsequently be directly updated |
| and that the query used to create the materialized view is stored in |
| exactly the same way that a view's query is stored, so that fresh data |
| can be generated for the materialized view with: |
| |
| <programlisting> |
| REFRESH MATERIALIZED VIEW mymatview; |
| </programlisting> |
| |
| The information about a materialized view in the |
| <productname>PostgreSQL</productname> system catalogs is exactly |
| the same as it is for a table or view. So for the parser, a |
| materialized view is a relation, just like a table or a view. When |
| a materialized view is referenced in a query, the data is returned |
| directly from the materialized view, like from a table; the rule is |
| only used for populating the materialized view. |
| </para> |
| |
| <para> |
| While access to the data stored in a materialized view is often much |
| faster than accessing the underlying tables directly or through a view, |
| the data is not always current; yet sometimes current data is not needed. |
| Consider a table which records sales: |
| |
| <programlisting> |
| CREATE TABLE invoice ( |
| invoice_no integer PRIMARY KEY, |
| seller_no integer, -- ID of salesperson |
| invoice_date date, -- date of sale |
| invoice_amt numeric(13,2) -- amount of sale |
| ); |
| </programlisting> |
| |
| If people want to be able to quickly graph historical sales data, they |
| might want to summarize, and they may not care about the incomplete data |
| for the current date: |
| |
| <programlisting> |
| CREATE MATERIALIZED VIEW sales_summary AS |
| SELECT |
| seller_no, |
| invoice_date, |
| sum(invoice_amt)::numeric(13,2) as sales_amt |
| FROM invoice |
| WHERE invoice_date < CURRENT_DATE |
| GROUP BY |
| seller_no, |
| invoice_date; |
| |
| CREATE UNIQUE INDEX sales_summary_seller |
| ON sales_summary (seller_no, invoice_date); |
| </programlisting> |
| |
| This materialized view might be useful for displaying a graph in the |
| dashboard created for salespeople. A job could be scheduled to update |
| the statistics each night using this SQL statement: |
| |
| <programlisting> |
| REFRESH MATERIALIZED VIEW sales_summary; |
| </programlisting> |
| </para> |
| |
| <para> |
| Another use for a materialized view is to allow faster access to data |
| brought across from a remote system through a foreign data wrapper. |
| A simple example using <literal>file_fdw</literal> is below, with timings, |
| but since this is using cache on the local system the performance |
| difference compared to access to a remote system would usually be greater |
| than shown here. Notice we are also exploiting the ability to put an |
| index on the materialized view, whereas <literal>file_fdw</literal> does |
| not support indexes; this advantage might not apply for other sorts of |
| foreign data access. |
| </para> |
| |
| <para> |
| Setup: |
| |
| <programlisting> |
| CREATE EXTENSION file_fdw; |
| CREATE SERVER local_file FOREIGN DATA WRAPPER file_fdw; |
| CREATE FOREIGN TABLE words (word text NOT NULL) |
| SERVER local_file |
| OPTIONS (filename '/usr/share/dict/words'); |
| CREATE MATERIALIZED VIEW wrd AS SELECT * FROM words; |
| CREATE UNIQUE INDEX wrd_word ON wrd (word); |
| CREATE EXTENSION pg_trgm; |
| CREATE INDEX wrd_trgm ON wrd USING gist (word gist_trgm_ops); |
| VACUUM ANALYZE wrd; |
| </programlisting> |
| |
| Now let's spell-check a word. Using <literal>file_fdw</literal> directly: |
| |
| <programlisting> |
| SELECT count(*) FROM words WHERE word = 'caterpiler'; |
| |
| count |
| ------- |
| 0 |
| (1 row) |
| </programlisting> |
| |
| With <command>EXPLAIN ANALYZE</command>, we see: |
| |
| <programlisting> |
| Aggregate (cost=21763.99..21764.00 rows=1 width=0) (actual time=188.180..188.181 rows=1 loops=1) |
| -> Foreign Scan on words (cost=0.00..21761.41 rows=1032 width=0) (actual time=188.177..188.177 rows=0 loops=1) |
| Filter: (word = 'caterpiler'::text) |
| Rows Removed by Filter: 479829 |
| Foreign File: /usr/share/dict/words |
| Foreign File Size: 4953699 |
| Planning time: 0.118 ms |
| Execution time: 188.273 ms |
| </programlisting> |
| |
| If the materialized view is used instead, the query is much faster: |
| |
| <programlisting> |
| Aggregate (cost=4.44..4.45 rows=1 width=0) (actual time=0.042..0.042 rows=1 loops=1) |
| -> Index Only Scan using wrd_word on wrd (cost=0.42..4.44 rows=1 width=0) (actual time=0.039..0.039 rows=0 loops=1) |
| Index Cond: (word = 'caterpiler'::text) |
| Heap Fetches: 0 |
| Planning time: 0.164 ms |
| Execution time: 0.117 ms |
| </programlisting> |
| |
| Either way, the word is spelled wrong, so let's look for what we might |
| have wanted. Again using <literal>file_fdw</literal> and |
| <literal>pg_trgm</literal>: |
| |
| <programlisting> |
| SELECT word FROM words ORDER BY word <-> 'caterpiler' LIMIT 10; |
| |
| word |
| --------------- |
| cater |
| caterpillar |
| Caterpillar |
| caterpillars |
| caterpillar's |
| Caterpillar's |
| caterer |
| caterer's |
| caters |
| catered |
| (10 rows) |
| </programlisting> |
| |
| <programlisting> |
| Limit (cost=11583.61..11583.64 rows=10 width=32) (actual time=1431.591..1431.594 rows=10 loops=1) |
| -> Sort (cost=11583.61..11804.76 rows=88459 width=32) (actual time=1431.589..1431.591 rows=10 loops=1) |
| Sort Key: ((word <-> 'caterpiler'::text)) |
| Sort Method: top-N heapsort Memory: 25kB |
| -> Foreign Scan on words (cost=0.00..9672.05 rows=88459 width=32) (actual time=0.057..1286.455 rows=479829 loops=1) |
| Foreign File: /usr/share/dict/words |
| Foreign File Size: 4953699 |
| Planning time: 0.128 ms |
| Execution time: 1431.679 ms |
| </programlisting> |
| |
| Using the materialized view: |
| |
| <programlisting> |
| Limit (cost=0.29..1.06 rows=10 width=10) (actual time=187.222..188.257 rows=10 loops=1) |
| -> Index Scan using wrd_trgm on wrd (cost=0.29..37020.87 rows=479829 width=10) (actual time=187.219..188.252 rows=10 loops=1) |
| Order By: (word <-> 'caterpiler'::text) |
| Planning time: 0.196 ms |
| Execution time: 198.640 ms |
| </programlisting> |
| |
| If you can tolerate periodic update of the remote data to the local |
| database, the performance benefit can be substantial. |
| </para> |
| |
| </sect1> |
| |
| <sect1 id="rules-ivm"> |
| <title>Incremental View Maintenance</title> |
| |
| <indexterm zone="rules-ivm"> |
| <primary>incremental view maintenance</primary> |
| </indexterm> |
| |
| <indexterm zone="rules-ivm"> |
| <primary>materialized view</primary> |
| <secondary>incremental view maintenance</secondary> |
| </indexterm> |
| |
| <indexterm zone="rules-ivm"> |
| <primary>view</primary> |
| <secondary>incremental view maintenance</secondary> |
| </indexterm> |
| |
| <sect2 id="rules-ivm-overview"> |
| <title>Overview</title> |
| |
| <para> |
| Incremental View Maintenance (<acronym>IVM</acronym>) is a way to make |
| materialized views up-to-date in which only incremental changes are computed |
| and applied on views rather than recomputing the contents from scratch as |
| <command>REFRESH MATERIALIZED VIEW</command> does. <acronym>IVM</acronym> |
| can update materialized views more efficiently than recomputation when only |
| small parts of the view are changed. |
| </para> |
| |
| <para> |
| There are two approaches with regard to timing of view maintenance: |
| immediate and deferred. In immediate maintenance, views are updated in the |
| same transaction that its base table is modified. In deferred maintenance, |
| views are updated after the transaction is committed, for example, when the |
| view is accessed, as a response to user command like <command>REFRESH |
| MATERIALIZED VIEW</command>, or periodically in background, and so on. |
| <productname>PostgreSQL</productname> currently implements only a kind of |
| immediate maintenance, in which materialized views are updated immediately |
| in AFTER triggers when a base table is modified. |
| </para> |
| |
| <para> |
| To create materialized views supporting <acronym>IVM</acronym>, use the |
| <command>CREATE INCREMENTAL MATERIALIZED VIEW</command>, for example: |
| <programlisting> |
| CREATE <emphasis>INCREMENTAL</emphasis> MATERIALIZED VIEW mymatview AS SELECT * FROM mytab; |
| </programlisting> |
| When a materialized view is created with the <literal>INCREMENTAL</literal> |
| keyword, some triggers are automatically created so that the view's contents are |
| immediately updated when its base tables are modified. We call this form |
| of materialized view an Incrementally Maintainable Materialized View |
| (<acronym>IMMV</acronym>). |
| <programlisting> |
| postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m AS SELECT * FROM t0; |
| NOTICE: could not create an index on materialized view "m" automatically |
| HINT: Create an index on the materialized view for effcient incremental maintenance. |
| SELECT 3 |
| postgres=# SELECT * FROM m; |
| i |
| --- |
| 1 |
| 2 |
| 3 |
| (3 rows) |
| |
| postgres=# INSERT INTO t0 VALUES (4); |
| INSERT 0 1 |
| postgres=# SELECT * FROM m; -- automatically updated |
| i |
| --- |
| 1 |
| 2 |
| 3 |
| 4 |
| (4 rows) |
| </programlisting> |
| </para> |
| |
| <para> |
| Some <acronym>IMMV</acronym>s have hidden columns which are added |
| automatically when a materialized view is created. Their name starts |
| with <literal>__ivm_</literal> and they contain information required |
| for maintaining the <acronym>IMMV</acronym>. Such columns are not visible |
| when the <acronym>IMMV</acronym> is accessed by <literal>SELECT *</literal> |
| but are visible if the column name is explicitly specified in the target |
| list. We can also see the hidden columns in <literal>\d</literal> |
| meta-commands of <command>psql</command> commands. |
| </para> |
| |
| <para> |
| In general, <acronym>IMMV</acronym>s allow faster updates of materialized |
| views at the price of slower updates to their base tables. Updates of |
| <acronym>IMMV</acronym> is slower because triggers will be invoked and the |
| view is updated in triggers per modification statement. |
| </para> |
| |
| <para> |
| For example, suppose a normal materialized view defined as below: |
| |
| <programlisting> |
| test=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm AS |
| SELECT a.aid, b.bid, a.abalance, b.bbalance |
| FROM pgbench_accounts a JOIN pgbench_branches b USING(bid); |
| SELECT 10000000 |
| |
| </programlisting> |
| |
| Updating a tuple in a base table of this materialized view is rapid but the |
| <command>REFRESH MATERIALIZED VIEW</command> command on this view takes a long time: |
| |
| <programlisting> |
| test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1; |
| UPDATE 1 |
| Time: 0.990 ms |
| |
| test=# REFRESH MATERIALIZED VIEW mv_normal ; |
| REFRESH MATERIALIZED VIEW |
| Time: 33533.952 ms (00:33.534) |
| </programlisting> |
| </para> |
| |
| <para> |
| On the other hand, after creating <acronym>IMMV</acronym> with the same view |
| definition as below: |
| |
| <programlisting> |
| test=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm AS |
| SELECT a.aid, b.bid, a.abalance, b.bbalance |
| FROM pgbench_accounts a JOIN pgbench_branches b USING(bid); |
| test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1; |
| NOTICE: created index "mv_ivm_index" on materialized view "mv_ivm" |
| </programlisting> |
| |
| updating a tuple in a base table takes more than the normal view, |
| but its content is updated automatically and this is faster than the |
| <command>REFRESH MATERIALIZED VIEW</command> command. |
| |
| <programlisting> |
| test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1; |
| UPDATE 1 |
| Time: 13.068 ms |
| </programlisting> |
| |
| </para> |
| |
| <para> |
| Appropriate indexes on <acronym>IMMV</acronym>s are necessary for |
| efficient <acronym>IVM</acronym> because it looks for tuples to be |
| updated in <acronym>IMMV</acronym>. If there are no indexes, it |
| will take a long time. |
| </para> |
| |
| <para> |
| Therefore, when <acronym>IMMV</acronym> is defined, a unique index is created on the view |
| automatically if possible. If the view definition query has a GROUP BY clause, a unique |
| index is created on the columns of GROUP BY expressions. Also, if the view has DISTINCT |
| clause, a unique index is created on all columns in the target list. Otherwise, if the |
| view contains all primary key attritubes of its base tables in the target list, a unique |
| index is created on these attritubes. In other cases, no index is created. |
| </para> |
| |
| <para> |
| In the previous example, a unique index "mv_ivm_index" is created on aid and bid |
| columns of materialized view "mv_ivm", and this enables the rapid update of the view. |
| Dropping this index make updating the view take a loger time. |
| <programlisting> |
| test=# DROP INDEX mv_ivm_index; |
| DROP INDEX |
| Time: 67.081 ms |
| |
| test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1; |
| UPDATE 1 |
| Time: 16386.245 ms (00:16.386) |
| </programlisting> |
| |
| </para> |
| |
| <para> |
| <acronym>IVM</acronym> is effective when we want to keep a materialized |
| view up-to-date and small fraction of a base table is modified |
| infrequently. Due to the overhead of immediate maintenance, <acronym>IVM</acronym> |
| is not effective when a base table is modified frequently. Also, when a |
| large part of a base table is modified or large data is inserted into a |
| base table, <acronym>IVM</acronym> is not effective and the cost of |
| maintenance can be larger than the <command>REFRESH MATERIALIZED VIEW</command> |
| command. In such situation, we can use <command>REFRESH MATERIALIZED VIEW</command> |
| and specify <literal>WITH NO DATA</literal> to disable immediate |
| maintenance before modifying a base table. After a base table modification, |
| execute the <command>REFRESH MATERIALIZED VIEW</command> (with <literal>WITH DATA</literal>) |
| command to refresh the view data and enable immediate maintenance. |
| </para> |
| |
| </sect2> |
| |
| <sect2 id="rules-ivm-support"> |
| <title>Supported View Definitions and Restrictions</title> |
| |
| <para> |
| Currently, we can create <acronym>IMMV</acronym>s using inner joins, and some |
| aggregates. However, several restrictions apply to the definition of IMMV. |
| </para> |
| |
| <sect3 id="rules-ivm-support-joins"> |
| <title>Joins</title> |
| <para> |
| Inner joins including self-join are supported. Outer joins are not supported. |
| </para> |
| </sect3> |
| |
| <sect3 id="rules-ivm-support-aggregates"> |
| <title>Aggregates</title> |
| <para> |
| Supported aggregate functions are <function>count</function>, <function>sum</function>, |
| <function>avg</function>, <function>min</function>, and <function>max</function>. |
| Currently, only built-in aggregate functions are supported and user defined |
| aggregates cannot be used. When a base table is modified, the new aggregated |
| values are incrementally calculated using the old aggregated values and values |
| of related hidden columns stored in <acronym>IMMV</acronym>. |
| </para> |
| |
| <para> |
| Note that for <function>min</function> or <function>max</function>, the new values |
| could be re-calculated from base tables with regard to the affected groups when a |
| tuple containing the current minimal or maximal values are deleted from a base table. |
| Therefore, it can takes a long time to update an <acronym>IMMV</acronym> containing |
| these functions. |
| </para> |
| |
| <para> |
| Also note that using <function>sum</function> or <function>avg</function> on |
| <type>real</type> (<type>float4</type>) type or <type>double precision</type> |
| (<type>float8</type>) type in <acronym>IMMV</acronym> is unsafe. This is |
| because aggregated values in <acronym>IMMV</acronym> can become different from |
| results calculated from base tables due to the limited precision of these types. |
| To avoid this problem, use the <type>numeric</type> type instead. |
| </para> |
| |
| <sect4 id="rules-ivm-restrictions-aggregates"> |
| <title>Restrictions on Aggregates</title> |
| <para> |
| There are the following restrictions: |
| <itemizedlist> |
| <listitem> |
| <para> |
| If we have a <literal>GROUP BY</literal> clause, expressions specified in |
| <literal>GROUP BY</literal> must appear in the target list. This is |
| how tuples to be updated in the <acronym>IMMV</acronym> are identified. |
| These attributes are used as scan keys for searching tuples in the |
| <acronym>IMMV</acronym>, so indexes on them are required for efficient |
| <acronym>IVM</acronym>. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| <literal>HAVING</literal> clause cannot be used. |
| </para> |
| </listitem> |
| </itemizedlist> |
| </para> |
| </sect4> |
| </sect3> |
| |
| <sect3 id="rules-ivm-general-restricitons"> |
| <title>Other General Restrictions</title> |
| <para> |
| There are other restrictions which generally apply to <acronym>IMMV</acronym>: |
| <itemizedlist> |
| <listitem> |
| <para> |
| Sub-queries cannot be used. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| CTEs cannot be used. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Window functions cannot be used. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| <acronym>IMMV</acronym>s must be based on simple base tables. It's not |
| supported to create them on top of views, materialized views, foreign tables, inhe. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| LIMIT and OFFSET clauses cannot be used. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| <acronym>IMMV</acronym>s cannot contain system columns. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| <acronym>IMMV</acronym>s cannot contain non-immutable functions. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| UNION/INTERSECT/EXCEPT clauses cannnot be used. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| DISTINCT ON clauses cannot be used. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| TABLESAMPLE parameter cannot be used. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| inheritance parent tables cannnot be used. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| VALUES clause cannnot be used. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| <literal>GROUPING SETS</literal> and <literal>FILTER</literal> clauses cannot be used. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| FOR UPDATE/SHARE cannot be used. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| targetlist cannot contain columns whose name start with <literal>__ivm_</literal>. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| targetlist cannot contain expressions which contain an aggregate in it. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Logical replication is not supported, that is, even when a base table |
| at a publisher node is modified, <acronym>IMMV</acronym>s at subscriber |
| nodes are not updated. |
| </para> |
| </listitem> |
| |
| </itemizedlist> |
| </para> |
| </sect3> |
| |
| </sect2> |
| |
| <sect2 id="rules-ivm-distinct"> |
| <title><literal>DISTINCT</literal></title> |
| |
| <para> |
| <productname>PostgreSQL</productname> supports <acronym>IMMV</acronym> with |
| <literal>DISTINCT</literal>. For example, suppose a <acronym>IMMV</acronym> |
| defined with <literal>DISTINCT</literal> on a base table containing duplicate |
| tuples. When tuples are deleted from the base table, a tuple in the view is |
| deleted if and only if the multiplicity of the tuple becomes zero. Moreover, |
| when tuples are inserted into the base table, a tuple is inserted into the |
| view only if the same tuple doesn't already exist in it. |
| </para> |
| |
| <para> |
| Physically, an <acronym>IMMV</acronym> defined with <literal>DISTINCT</literal> |
| contains tuples after eliminating duplicates, and the multiplicity of each tuple |
| is stored in a hidden column named <literal>__ivm_count__</literal>. |
| </para> |
| </sect2> |
| |
| <sect2 id="rules-ivm-concurrent-transactions"> |
| <title>Concurrent Transactions</title> |
| <para> |
| Suppose an <acronym>IMMV</acronym> is defined on two base tables and each |
| table was modified in different a concurrent transaction simultaneously. |
| In the transaction which was committed first, <acronym>IMMV</acronym> can |
| be updated considering only the change which happened in this transaction. |
| On the other hand, in order to update the view correctly in the transaction |
| which was committed later, we need to know the changes occurred in |
| both transactions. For this reason, <literal>ExclusiveLock</literal> |
| is held on an <acronym>IMMV</acronym> immediately after a base table is |
| modified in <literal>READ COMMITTED</literal> mode to make sure that |
| the <acronym>IMMV</acronym> is updated in the latter transaction after |
| the former transaction is committed. In <literal>REPEATABLE READ</literal> |
| or <literal>SERIALIZABLE</literal> mode, an error is raised immediately |
| if lock acquisition fails because any changes which occurred in |
| other transactions are not be visible in these modes and |
| <acronym>IMMV</acronym> cannot be updated correctly in such situations. |
| However, as an exception if the view has only one base table and |
| <command>INSERT</command> is performed on the table, |
| the lock held on thew view is <literal>RowExclusiveLock</literal>. |
| </para> |
| </sect2> |
| |
| <sect2 id="rules-ivm-rls"> |
| <title>Row Level Security</title> |
| <para> |
| If some base tables have row level security policy, rows that are not visible |
| to the materialized view's owner are excluded from the result. In addition, such |
| rows are excluded as well when views are incrementally maintained. However, if a |
| new policy is defined or policies are changed after the materialized view was created, |
| the new policy will not be applied to the view contents. To apply the new policy, |
| you need to refresh materialized views. |
| </para> |
| </sect2> |
| |
| </sect1> |
| |
| <sect1 id="rules-update"> |
| <title>Rules on <command>INSERT</command>, <command>UPDATE</command>, and <command>DELETE</command></title> |
| |
| <indexterm zone="rules-update"> |
| <primary>rule</primary> |
| <secondary sortas="INSERT">for INSERT</secondary> |
| </indexterm> |
| |
| <indexterm zone="rules-update"> |
| <primary>rule</primary> |
| <secondary sortas="UPDATE">for UPDATE</secondary> |
| </indexterm> |
| |
| <indexterm zone="rules-update"> |
| <primary>rule</primary> |
| <secondary sortas="DELETE">for DELETE</secondary> |
| </indexterm> |
| |
| <para> |
| Rules that are defined on <command>INSERT</command>, <command>UPDATE</command>, |
| and <command>DELETE</command> are significantly different from the view rules |
| described in the previous section. First, their <command>CREATE |
| RULE</command> command allows more: |
| |
| <itemizedlist> |
| <listitem> |
| <para> |
| They are allowed to have no action. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| They can have multiple actions. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| They can be <literal>INSTEAD</literal> or <literal>ALSO</literal> (the default). |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| The pseudorelations <literal>NEW</literal> and <literal>OLD</literal> become useful. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| They can have rule qualifications. |
| </para> |
| </listitem> |
| </itemizedlist> |
| |
| Second, they don't modify the query tree in place. Instead they |
| create zero or more new query trees and can throw away the |
| original one. |
| </para> |
| |
| <caution> |
| <para> |
| In many cases, tasks that could be performed by rules |
| on <command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command> are better done |
| with triggers. Triggers are notationally a bit more complicated, but their |
| semantics are much simpler to understand. Rules tend to have surprising |
| results when the original query contains volatile functions: volatile |
| functions may get executed more times than expected in the process of |
| carrying out the rules. |
| </para> |
| |
| <para> |
| Also, there are some cases that are not supported by these types of rules at |
| all, notably including <literal>WITH</literal> clauses in the original query and |
| multiple-assignment sub-<literal>SELECT</literal>s in the <literal>SET</literal> list |
| of <command>UPDATE</command> queries. This is because copying these constructs |
| into a rule query would result in multiple evaluations of the sub-query, |
| contrary to the express intent of the query's author. |
| </para> |
| </caution> |
| |
| <sect2> |
| <title>How Update Rules Work</title> |
| |
| <para> |
| Keep the syntax: |
| |
| <programlisting> |
| CREATE [ OR REPLACE ] RULE <replaceable class="parameter">name</replaceable> AS ON <replaceable class="parameter">event</replaceable> |
| TO <replaceable class="parameter">table</replaceable> [ WHERE <replaceable class="parameter">condition</replaceable> ] |
| DO [ ALSO | INSTEAD ] { NOTHING | <replaceable class="parameter">command</replaceable> | ( <replaceable class="parameter">command</replaceable> ; <replaceable class="parameter">command</replaceable> ... ) } |
| </programlisting> |
| |
| in mind. |
| In the following, <firstterm>update rules</firstterm> means rules that are defined |
| on <command>INSERT</command>, <command>UPDATE</command>, or <command>DELETE</command>. |
| </para> |
| |
| <para> |
| Update rules get applied by the rule system when the result |
| relation and the command type of a query tree are equal to the |
| object and event given in the <command>CREATE RULE</command> command. |
| For update rules, the rule system creates a list of query trees. |
| Initially the query-tree list is empty. |
| There can be zero (<literal>NOTHING</literal> key word), one, or multiple actions. |
| To simplify, we will look at a rule with one action. This rule |
| can have a qualification or not and it can be <literal>INSTEAD</literal> or |
| <literal>ALSO</literal> (the default). |
| </para> |
| |
| <para> |
| What is a rule qualification? It is a restriction that tells |
| when the actions of the rule should be done and when not. This |
| qualification can only reference the pseudorelations <literal>NEW</literal> and/or <literal>OLD</literal>, |
| which basically represent the relation that was given as object (but with a |
| special meaning). |
| </para> |
| |
| <para> |
| So we have three cases that produce the following query trees for |
| a one-action rule. |
| |
| <variablelist> |
| <varlistentry> |
| <term>No qualification, with either <literal>ALSO</literal> or |
| <literal>INSTEAD</literal></term> |
| <listitem> |
| <para> |
| the query tree from the rule action with the original query |
| tree's qualification added |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term>Qualification given and <literal>ALSO</literal></term> |
| <listitem> |
| <para> |
| the query tree from the rule action with the rule |
| qualification and the original query tree's qualification |
| added |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term>Qualification given and <literal>INSTEAD</literal></term> |
| <listitem> |
| <para> |
| the query tree from the rule action with the rule |
| qualification and the original query tree's qualification; and |
| the original query tree with the negated rule qualification |
| added |
| </para> |
| </listitem> |
| </varlistentry> |
| </variablelist> |
| |
| Finally, if the rule is <literal>ALSO</literal>, the unchanged original query tree is |
| added to the list. Since only qualified <literal>INSTEAD</literal> rules already add the |
| original query tree, we end up with either one or two output query trees |
| for a rule with one action. |
| </para> |
| |
| <para> |
| For <literal>ON INSERT</literal> rules, the original query (if not suppressed by <literal>INSTEAD</literal>) |
| is done before any actions added by rules. This allows the actions to |
| see the inserted row(s). But for <literal>ON UPDATE</literal> and <literal>ON |
| DELETE</literal> rules, the original query is done after the actions added by rules. |
| This ensures that the actions can see the to-be-updated or to-be-deleted |
| rows; otherwise, the actions might do nothing because they find no rows |
| matching their qualifications. |
| </para> |
| |
| <para> |
| The query trees generated from rule actions are thrown into the |
| rewrite system again, and maybe more rules get applied resulting |
| in additional or fewer query trees. |
| So a rule's actions must have either a different |
| command type or a different result relation than the rule itself is |
| on, otherwise this recursive process will end up in an infinite loop. |
| (Recursive expansion of a rule will be detected and reported as an |
| error.) |
| </para> |
| |
| <para> |
| The query trees found in the actions of the |
| <structname>pg_rewrite</structname> system catalog are only |
| templates. Since they can reference the range-table entries for |
| <literal>NEW</literal> and <literal>OLD</literal>, some substitutions have to be made before they can be |
| used. For any reference to <literal>NEW</literal>, the target list of the original |
| query is searched for a corresponding entry. If found, that |
| entry's expression replaces the reference. Otherwise, <literal>NEW</literal> means the |
| same as <literal>OLD</literal> (for an <command>UPDATE</command>) or is replaced by |
| a null value (for an <command>INSERT</command>). Any reference to <literal>OLD</literal> is |
| replaced by a reference to the range-table entry that is the |
| result relation. |
| </para> |
| |
| <para> |
| After the system is done applying update rules, it applies view rules to the |
| produced query tree(s). Views cannot insert new update actions so |
| there is no need to apply update rules to the output of view rewriting. |
| </para> |
| |
| <sect3> |
| <title>A First Rule Step by Step</title> |
| |
| <para> |
| Say we want to trace changes to the <literal>sl_avail</literal> column in the |
| <literal>shoelace_data</literal> relation. So we set up a log table |
| and a rule that conditionally writes a log entry when an |
| <command>UPDATE</command> is performed on |
| <literal>shoelace_data</literal>. |
| |
| <programlisting> |
| CREATE TABLE shoelace_log ( |
| sl_name text, -- shoelace changed |
| sl_avail integer, -- new available value |
| log_who text, -- who did it |
| log_when timestamp -- when |
| ); |
| |
| CREATE RULE log_shoelace AS ON UPDATE TO shoelace_data |
| WHERE NEW.sl_avail <> OLD.sl_avail |
| DO INSERT INTO shoelace_log VALUES ( |
| NEW.sl_name, |
| NEW.sl_avail, |
| current_user, |
| current_timestamp |
| ); |
| </programlisting> |
| </para> |
| |
| <para> |
| Now someone does: |
| |
| <programlisting> |
| UPDATE shoelace_data SET sl_avail = 6 WHERE sl_name = 'sl7'; |
| </programlisting> |
| |
| and we look at the log table: |
| |
| <programlisting> |
| SELECT * FROM shoelace_log; |
| |
| sl_name | sl_avail | log_who | log_when |
| ---------+----------+---------+---------------------------------- |
| sl7 | 6 | Al | Tue Oct 20 16:14:45 1998 MET DST |
| (1 row) |
| </programlisting> |
| </para> |
| |
| <para> |
| That's what we expected. What happened in the background is the following. |
| The parser created the query tree: |
| |
| <programlisting> |
| UPDATE shoelace_data SET sl_avail = 6 |
| FROM shoelace_data shoelace_data |
| WHERE shoelace_data.sl_name = 'sl7'; |
| </programlisting> |
| |
| There is a rule <literal>log_shoelace</literal> that is <literal>ON UPDATE</literal> with the rule |
| qualification expression: |
| |
| <programlisting> |
| NEW.sl_avail <> OLD.sl_avail |
| </programlisting> |
| |
| and the action: |
| |
| <programlisting> |
| INSERT INTO shoelace_log VALUES ( |
| new.sl_name, new.sl_avail, |
| current_user, current_timestamp ) |
| FROM shoelace_data new, shoelace_data old; |
| </programlisting> |
| |
| (This looks a little strange since you cannot normally write |
| <literal>INSERT ... VALUES ... FROM</literal>. The <literal>FROM</literal> |
| clause here is just to indicate that there are range-table entries |
| in the query tree for <literal>new</literal> and <literal>old</literal>. |
| These are needed so that they can be referenced by variables in |
| the <command>INSERT</command> command's query tree.) |
| </para> |
| |
| <para> |
| The rule is a qualified <literal>ALSO</literal> rule, so the rule system |
| has to return two query trees: the modified rule action and the original |
| query tree. In step 1, the range table of the original query is |
| incorporated into the rule's action query tree. This results in: |
| |
| <programlisting> |
| INSERT INTO shoelace_log VALUES ( |
| new.sl_name, new.sl_avail, |
| current_user, current_timestamp ) |
| FROM shoelace_data new, shoelace_data old, |
| <emphasis>shoelace_data shoelace_data</emphasis>; |
| </programlisting> |
| |
| In step 2, the rule qualification is added to it, so the result set |
| is restricted to rows where <literal>sl_avail</literal> changes: |
| |
| <programlisting> |
| INSERT INTO shoelace_log VALUES ( |
| new.sl_name, new.sl_avail, |
| current_user, current_timestamp ) |
| FROM shoelace_data new, shoelace_data old, |
| shoelace_data shoelace_data |
| <emphasis>WHERE new.sl_avail <> old.sl_avail</emphasis>; |
| </programlisting> |
| |
| (This looks even stranger, since <literal>INSERT ... VALUES</literal> doesn't have |
| a <literal>WHERE</literal> clause either, but the planner and executor will have no |
| difficulty with it. They need to support this same functionality |
| anyway for <literal>INSERT ... SELECT</literal>.) |
| </para> |
| |
| <para> |
| In step 3, the original query tree's qualification is added, |
| restricting the result set further to only the rows that would have been touched |
| by the original query: |
| |
| <programlisting> |
| INSERT INTO shoelace_log VALUES ( |
| new.sl_name, new.sl_avail, |
| current_user, current_timestamp ) |
| FROM shoelace_data new, shoelace_data old, |
| shoelace_data shoelace_data |
| WHERE new.sl_avail <> old.sl_avail |
| <emphasis>AND shoelace_data.sl_name = 'sl7'</emphasis>; |
| </programlisting> |
| </para> |
| |
| <para> |
| Step 4 replaces references to <literal>NEW</literal> by the target list entries from the |
| original query tree or by the matching variable references |
| from the result relation: |
| |
| <programlisting> |
| INSERT INTO shoelace_log VALUES ( |
| <emphasis>shoelace_data.sl_name</emphasis>, <emphasis>6</emphasis>, |
| current_user, current_timestamp ) |
| FROM shoelace_data new, shoelace_data old, |
| shoelace_data shoelace_data |
| WHERE <emphasis>6</emphasis> <> old.sl_avail |
| AND shoelace_data.sl_name = 'sl7'; |
| </programlisting> |
| |
| </para> |
| |
| <para> |
| Step 5 changes <literal>OLD</literal> references into result relation references: |
| |
| <programlisting> |
| INSERT INTO shoelace_log VALUES ( |
| shoelace_data.sl_name, 6, |
| current_user, current_timestamp ) |
| FROM shoelace_data new, shoelace_data old, |
| shoelace_data shoelace_data |
| WHERE 6 <> <emphasis>shoelace_data.sl_avail</emphasis> |
| AND shoelace_data.sl_name = 'sl7'; |
| </programlisting> |
| </para> |
| |
| <para> |
| That's it. Since the rule is <literal>ALSO</literal>, we also output the |
| original query tree. In short, the output from the rule system |
| is a list of two query trees that correspond to these statements: |
| |
| <programlisting> |
| INSERT INTO shoelace_log VALUES ( |
| shoelace_data.sl_name, 6, |
| current_user, current_timestamp ) |
| FROM shoelace_data |
| WHERE 6 <> shoelace_data.sl_avail |
| AND shoelace_data.sl_name = 'sl7'; |
| |
| UPDATE shoelace_data SET sl_avail = 6 |
| WHERE sl_name = 'sl7'; |
| </programlisting> |
| |
| These are executed in this order, and that is exactly what |
| the rule was meant to do. |
| </para> |
| |
| <para> |
| The substitutions and the added qualifications |
| ensure that, if the original query would be, say: |
| |
| <programlisting> |
| UPDATE shoelace_data SET sl_color = 'green' |
| WHERE sl_name = 'sl7'; |
| </programlisting> |
| |
| no log entry would get written. In that case, the original query |
| tree does not contain a target list entry for |
| <literal>sl_avail</literal>, so <literal>NEW.sl_avail</literal> will get |
| replaced by <literal>shoelace_data.sl_avail</literal>. Thus, the extra |
| command generated by the rule is: |
| |
| <programlisting> |
| INSERT INTO shoelace_log VALUES ( |
| shoelace_data.sl_name, <emphasis>shoelace_data.sl_avail</emphasis>, |
| current_user, current_timestamp ) |
| FROM shoelace_data |
| WHERE <emphasis>shoelace_data.sl_avail</emphasis> <> shoelace_data.sl_avail |
| AND shoelace_data.sl_name = 'sl7'; |
| </programlisting> |
| |
| and that qualification will never be true. |
| </para> |
| |
| <para> |
| It will also work if the original query modifies multiple rows. So |
| if someone issued the command: |
| |
| <programlisting> |
| UPDATE shoelace_data SET sl_avail = 0 |
| WHERE sl_color = 'black'; |
| </programlisting> |
| |
| four rows in fact get updated (<literal>sl1</literal>, <literal>sl2</literal>, <literal>sl3</literal>, and <literal>sl4</literal>). |
| But <literal>sl3</literal> already has <literal>sl_avail = 0</literal>. In this case, the original |
| query trees qualification is different and that results |
| in the extra query tree: |
| |
| <programlisting> |
| INSERT INTO shoelace_log |
| SELECT shoelace_data.sl_name, 0, |
| current_user, current_timestamp |
| FROM shoelace_data |
| WHERE 0 <> shoelace_data.sl_avail |
| AND <emphasis>shoelace_data.sl_color = 'black'</emphasis>; |
| </programlisting> |
| |
| being generated by the rule. This query tree will surely insert |
| three new log entries. And that's absolutely correct. |
| </para> |
| |
| <para> |
| Here we can see why it is important that the original query tree |
| is executed last. If the <command>UPDATE</command> had been |
| executed first, all the rows would have already been set to zero, so the |
| logging <command>INSERT</command> would not find any row where |
| <literal>0 <> shoelace_data.sl_avail</literal>. |
| </para> |
| </sect3> |
| |
| </sect2> |
| |
| <sect2 id="rules-update-views"> |
| <title>Cooperation with Views</title> |
| |
| <indexterm zone="rules-update-views"><primary>view</primary><secondary>updating</secondary></indexterm> |
| |
| <para> |
| A simple way to protect view relations from the mentioned |
| possibility that someone can try to run <command>INSERT</command>, |
| <command>UPDATE</command>, or <command>DELETE</command> on them is |
| to let those query trees get thrown away. So we could create the rules: |
| |
| <programlisting> |
| CREATE RULE shoe_ins_protect AS ON INSERT TO shoe |
| DO INSTEAD NOTHING; |
| CREATE RULE shoe_upd_protect AS ON UPDATE TO shoe |
| DO INSTEAD NOTHING; |
| CREATE RULE shoe_del_protect AS ON DELETE TO shoe |
| DO INSTEAD NOTHING; |
| </programlisting> |
| |
| If someone now tries to do any of these operations on the view |
| relation <literal>shoe</literal>, the rule system will |
| apply these rules. Since the rules have |
| no actions and are <literal>INSTEAD</literal>, the resulting list of |
| query trees will be empty and the whole query will become |
| nothing because there is nothing left to be optimized or |
| executed after the rule system is done with it. |
| </para> |
| |
| <para> |
| A more sophisticated way to use the rule system is to |
| create rules that rewrite the query tree into one that |
| does the right operation on the real tables. To do that |
| on the <literal>shoelace</literal> view, we create |
| the following rules: |
| |
| <programlisting> |
| CREATE RULE shoelace_ins AS ON INSERT TO shoelace |
| DO INSTEAD |
| INSERT INTO shoelace_data VALUES ( |
| NEW.sl_name, |
| NEW.sl_avail, |
| NEW.sl_color, |
| NEW.sl_len, |
| NEW.sl_unit |
| ); |
| |
| CREATE RULE shoelace_upd AS ON UPDATE TO shoelace |
| DO INSTEAD |
| UPDATE shoelace_data |
| SET sl_name = NEW.sl_name, |
| sl_avail = NEW.sl_avail, |
| sl_color = NEW.sl_color, |
| sl_len = NEW.sl_len, |
| sl_unit = NEW.sl_unit |
| WHERE sl_name = OLD.sl_name; |
| |
| CREATE RULE shoelace_del AS ON DELETE TO shoelace |
| DO INSTEAD |
| DELETE FROM shoelace_data |
| WHERE sl_name = OLD.sl_name; |
| </programlisting> |
| </para> |
| |
| <para> |
| If you want to support <literal>RETURNING</literal> queries on the view, |
| you need to make the rules include <literal>RETURNING</literal> clauses that |
| compute the view rows. This is usually pretty trivial for views on a |
| single table, but it's a bit tedious for join views such as |
| <literal>shoelace</literal>. An example for the insert case is: |
| |
| <programlisting> |
| CREATE RULE shoelace_ins AS ON INSERT TO shoelace |
| DO INSTEAD |
| INSERT INTO shoelace_data VALUES ( |
| NEW.sl_name, |
| NEW.sl_avail, |
| NEW.sl_color, |
| NEW.sl_len, |
| NEW.sl_unit |
| ) |
| RETURNING |
| shoelace_data.*, |
| (SELECT shoelace_data.sl_len * u.un_fact |
| FROM unit u WHERE shoelace_data.sl_unit = u.un_name); |
| </programlisting> |
| |
| Note that this one rule supports both <command>INSERT</command> and |
| <command>INSERT RETURNING</command> queries on the view — the |
| <literal>RETURNING</literal> clause is simply ignored for <command>INSERT</command>. |
| </para> |
| |
| <para> |
| Now assume that once in a while, a pack of shoelaces arrives at |
| the shop and a big parts list along with it. But you don't want |
| to manually update the <literal>shoelace</literal> view every |
| time. Instead we set up two little tables: one where you can |
| insert the items from the part list, and one with a special |
| trick. The creation commands for these are: |
| |
| <programlisting> |
| CREATE TABLE shoelace_arrive ( |
| arr_name text, |
| arr_quant integer |
| ); |
| |
| CREATE TABLE shoelace_ok ( |
| ok_name text, |
| ok_quant integer |
| ); |
| |
| CREATE RULE shoelace_ok_ins AS ON INSERT TO shoelace_ok |
| DO INSTEAD |
| UPDATE shoelace |
| SET sl_avail = sl_avail + NEW.ok_quant |
| WHERE sl_name = NEW.ok_name; |
| </programlisting> |
| |
| Now you can fill the table <literal>shoelace_arrive</literal> with |
| the data from the parts list: |
| |
| <programlisting> |
| SELECT * FROM shoelace_arrive; |
| |
| arr_name | arr_quant |
| ----------+----------- |
| sl3 | 10 |
| sl6 | 20 |
| sl8 | 20 |
| (3 rows) |
| </programlisting> |
| |
| Take a quick look at the current data: |
| |
| <programlisting> |
| SELECT * FROM shoelace; |
| |
| sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm |
| ----------+----------+----------+--------+---------+----------- |
| sl1 | 5 | black | 80 | cm | 80 |
| sl2 | 6 | black | 100 | cm | 100 |
| sl7 | 6 | brown | 60 | cm | 60 |
| sl3 | 0 | black | 35 | inch | 88.9 |
| sl4 | 8 | black | 40 | inch | 101.6 |
| sl8 | 1 | brown | 40 | inch | 101.6 |
| sl5 | 4 | brown | 1 | m | 100 |
| sl6 | 0 | brown | 0.9 | m | 90 |
| (8 rows) |
| </programlisting> |
| |
| Now move the arrived shoelaces in: |
| |
| <programlisting> |
| INSERT INTO shoelace_ok SELECT * FROM shoelace_arrive; |
| </programlisting> |
| |
| and check the results: |
| |
| <programlisting> |
| SELECT * FROM shoelace ORDER BY sl_name; |
| |
| sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm |
| ----------+----------+----------+--------+---------+----------- |
| sl1 | 5 | black | 80 | cm | 80 |
| sl2 | 6 | black | 100 | cm | 100 |
| sl7 | 6 | brown | 60 | cm | 60 |
| sl4 | 8 | black | 40 | inch | 101.6 |
| sl3 | 10 | black | 35 | inch | 88.9 |
| sl8 | 21 | brown | 40 | inch | 101.6 |
| sl5 | 4 | brown | 1 | m | 100 |
| sl6 | 20 | brown | 0.9 | m | 90 |
| (8 rows) |
| |
| SELECT * FROM shoelace_log; |
| |
| sl_name | sl_avail | log_who| log_when |
| ---------+----------+--------+---------------------------------- |
| sl7 | 6 | Al | Tue Oct 20 19:14:45 1998 MET DST |
| sl3 | 10 | Al | Tue Oct 20 19:25:16 1998 MET DST |
| sl6 | 20 | Al | Tue Oct 20 19:25:16 1998 MET DST |
| sl8 | 21 | Al | Tue Oct 20 19:25:16 1998 MET DST |
| (4 rows) |
| </programlisting> |
| </para> |
| |
| <para> |
| It's a long way from the one <literal>INSERT ... SELECT</literal> |
| to these results. And the description of the query-tree |
| transformation will be the last in this chapter. First, there is |
| the parser's output: |
| |
| <programlisting> |
| INSERT INTO shoelace_ok |
| SELECT shoelace_arrive.arr_name, shoelace_arrive.arr_quant |
| FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok; |
| </programlisting> |
| |
| Now the first rule <literal>shoelace_ok_ins</literal> is applied and turns this |
| into: |
| |
| <programlisting> |
| UPDATE shoelace |
| SET sl_avail = shoelace.sl_avail + shoelace_arrive.arr_quant |
| FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok, |
| shoelace_ok old, shoelace_ok new, |
| shoelace shoelace |
| WHERE shoelace.sl_name = shoelace_arrive.arr_name; |
| </programlisting> |
| |
| and throws away the original <command>INSERT</command> on |
| <literal>shoelace_ok</literal>. This rewritten query is passed to |
| the rule system again, and the second applied rule |
| <literal>shoelace_upd</literal> produces: |
| |
| <programlisting> |
| UPDATE shoelace_data |
| SET sl_name = shoelace.sl_name, |
| sl_avail = shoelace.sl_avail + shoelace_arrive.arr_quant, |
| sl_color = shoelace.sl_color, |
| sl_len = shoelace.sl_len, |
| sl_unit = shoelace.sl_unit |
| FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok, |
| shoelace_ok old, shoelace_ok new, |
| shoelace shoelace, shoelace old, |
| shoelace new, shoelace_data shoelace_data |
| WHERE shoelace.sl_name = shoelace_arrive.arr_name |
| AND shoelace_data.sl_name = shoelace.sl_name; |
| </programlisting> |
| |
| Again it's an <literal>INSTEAD</literal> rule and the previous query tree is trashed. |
| Note that this query still uses the view <literal>shoelace</literal>. |
| But the rule system isn't finished with this step, so it continues |
| and applies the <literal>_RETURN</literal> rule on it, and we get: |
| |
| <programlisting> |
| UPDATE shoelace_data |
| SET sl_name = s.sl_name, |
| sl_avail = s.sl_avail + shoelace_arrive.arr_quant, |
| sl_color = s.sl_color, |
| sl_len = s.sl_len, |
| sl_unit = s.sl_unit |
| FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok, |
| shoelace_ok old, shoelace_ok new, |
| shoelace shoelace, shoelace old, |
| shoelace new, shoelace_data shoelace_data, |
| shoelace old, shoelace new, |
| shoelace_data s, unit u |
| WHERE s.sl_name = shoelace_arrive.arr_name |
| AND shoelace_data.sl_name = s.sl_name; |
| </programlisting> |
| |
| Finally, the rule <literal>log_shoelace</literal> gets applied, |
| producing the extra query tree: |
| |
| <programlisting> |
| INSERT INTO shoelace_log |
| SELECT s.sl_name, |
| s.sl_avail + shoelace_arrive.arr_quant, |
| current_user, |
| current_timestamp |
| FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok, |
| shoelace_ok old, shoelace_ok new, |
| shoelace shoelace, shoelace old, |
| shoelace new, shoelace_data shoelace_data, |
| shoelace old, shoelace new, |
| shoelace_data s, unit u, |
| shoelace_data old, shoelace_data new |
| shoelace_log shoelace_log |
| WHERE s.sl_name = shoelace_arrive.arr_name |
| AND shoelace_data.sl_name = s.sl_name |
| AND (s.sl_avail + shoelace_arrive.arr_quant) <> s.sl_avail; |
| </programlisting> |
| |
| After that the rule system runs out of rules and returns the |
| generated query trees. |
| </para> |
| |
| <para> |
| So we end up with two final query trees that are equivalent to the |
| <acronym>SQL</acronym> statements: |
| |
| <programlisting> |
| INSERT INTO shoelace_log |
| SELECT s.sl_name, |
| s.sl_avail + shoelace_arrive.arr_quant, |
| current_user, |
| current_timestamp |
| FROM shoelace_arrive shoelace_arrive, shoelace_data shoelace_data, |
| shoelace_data s |
| WHERE s.sl_name = shoelace_arrive.arr_name |
| AND shoelace_data.sl_name = s.sl_name |
| AND s.sl_avail + shoelace_arrive.arr_quant <> s.sl_avail; |
| |
| UPDATE shoelace_data |
| SET sl_avail = shoelace_data.sl_avail + shoelace_arrive.arr_quant |
| FROM shoelace_arrive shoelace_arrive, |
| shoelace_data shoelace_data, |
| shoelace_data s |
| WHERE s.sl_name = shoelace_arrive.sl_name |
| AND shoelace_data.sl_name = s.sl_name; |
| </programlisting> |
| |
| The result is that data coming from one relation inserted into another, |
| changed into updates on a third, changed into updating |
| a fourth plus logging that final update in a fifth |
| gets reduced into two queries. |
| </para> |
| |
| <para> |
| There is a little detail that's a bit ugly. Looking at the two |
| queries, it turns out that the <literal>shoelace_data</literal> |
| relation appears twice in the range table where it could |
| definitely be reduced to one. The planner does not handle it and |
| so the execution plan for the rule systems output of the |
| <command>INSERT</command> will be |
| |
| <literallayout class="monospaced"> |
| Nested Loop |
| -> Merge Join |
| -> Seq Scan |
| -> Sort |
| -> Seq Scan on s |
| -> Seq Scan |
| -> Sort |
| -> Seq Scan on shoelace_arrive |
| -> Seq Scan on shoelace_data |
| </literallayout> |
| |
| while omitting the extra range table entry would result in a |
| |
| <literallayout class="monospaced"> |
| Merge Join |
| -> Seq Scan |
| -> Sort |
| -> Seq Scan on s |
| -> Seq Scan |
| -> Sort |
| -> Seq Scan on shoelace_arrive |
| </literallayout> |
| |
| which produces exactly the same entries in the log table. Thus, |
| the rule system caused one extra scan on the table |
| <literal>shoelace_data</literal> that is absolutely not |
| necessary. And the same redundant scan is done once more in the |
| <command>UPDATE</command>. But it was a really hard job to make |
| that all possible at all. |
| </para> |
| |
| <para> |
| Now we make a final demonstration of the |
| <productname>PostgreSQL</productname> rule system and its power. |
| Say you add some shoelaces with extraordinary colors to your |
| database: |
| |
| <programlisting> |
| INSERT INTO shoelace VALUES ('sl9', 0, 'pink', 35.0, 'inch', 0.0); |
| INSERT INTO shoelace VALUES ('sl10', 1000, 'magenta', 40.0, 'inch', 0.0); |
| </programlisting> |
| |
| We would like to make a view to check which |
| <literal>shoelace</literal> entries do not fit any shoe in color. |
| The view for this is: |
| |
| <programlisting> |
| CREATE VIEW shoelace_mismatch AS |
| SELECT * FROM shoelace WHERE NOT EXISTS |
| (SELECT shoename FROM shoe WHERE slcolor = sl_color); |
| </programlisting> |
| |
| Its output is: |
| |
| <programlisting> |
| SELECT * FROM shoelace_mismatch; |
| |
| sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm |
| ---------+----------+----------+--------+---------+----------- |
| sl9 | 0 | pink | 35 | inch | 88.9 |
| sl10 | 1000 | magenta | 40 | inch | 101.6 |
| </programlisting> |
| </para> |
| |
| <para> |
| Now we want to set it up so that mismatching shoelaces that are |
| not in stock are deleted from the database. |
| To make it a little harder for <productname>PostgreSQL</productname>, |
| we don't delete it directly. Instead we create one more view: |
| |
| <programlisting> |
| CREATE VIEW shoelace_can_delete AS |
| SELECT * FROM shoelace_mismatch WHERE sl_avail = 0; |
| </programlisting> |
| |
| and do it this way: |
| |
| <programlisting> |
| DELETE FROM shoelace WHERE EXISTS |
| (SELECT * FROM shoelace_can_delete |
| WHERE sl_name = shoelace.sl_name); |
| </programlisting> |
| |
| The results are: |
| |
| <programlisting> |
| SELECT * FROM shoelace; |
| |
| sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm |
| ---------+----------+----------+--------+---------+----------- |
| sl1 | 5 | black | 80 | cm | 80 |
| sl2 | 6 | black | 100 | cm | 100 |
| sl7 | 6 | brown | 60 | cm | 60 |
| sl4 | 8 | black | 40 | inch | 101.6 |
| sl3 | 10 | black | 35 | inch | 88.9 |
| sl8 | 21 | brown | 40 | inch | 101.6 |
| sl10 | 1000 | magenta | 40 | inch | 101.6 |
| sl5 | 4 | brown | 1 | m | 100 |
| sl6 | 20 | brown | 0.9 | m | 90 |
| (9 rows) |
| </programlisting> |
| </para> |
| |
| <para> |
| A <command>DELETE</command> on a view, with a subquery qualification that |
| in total uses 4 nesting/joined views, where one of them |
| itself has a subquery qualification containing a view |
| and where calculated view columns are used, |
| gets rewritten into |
| one single query tree that deletes the requested data |
| from a real table. |
| </para> |
| |
| <para> |
| There are probably only a few situations out in the real world |
| where such a construct is necessary. But it makes you feel |
| comfortable that it works. |
| </para> |
| </sect2> |
| |
| </sect1> |
| |
| <sect1 id="rules-privileges"> |
| <title>Rules and Privileges</title> |
| |
| <indexterm zone="rules-privileges"> |
| <primary>privilege</primary> |
| <secondary sortas="Regeln">with rules</secondary> |
| </indexterm> |
| |
| <indexterm zone="rules-privileges"> |
| <primary>privilege</primary> |
| <secondary sortas="Sichten">with views</secondary> |
| </indexterm> |
| |
| <para> |
| Due to rewriting of queries by the <productname>PostgreSQL</productname> |
| rule system, other tables/views than those used in the original |
| query get accessed. When update rules are used, this can include write access |
| to tables. |
| </para> |
| |
| <para> |
| Rewrite rules don't have a separate owner. The owner of |
| a relation (table or view) is automatically the owner of the |
| rewrite rules that are defined for it. |
| The <productname>PostgreSQL</productname> rule system changes the |
| behavior of the default access control system. Relations that |
| are used due to rules get checked against the |
| privileges of the rule owner, not the user invoking the rule. |
| This means that users only need the required privileges |
| for the tables/views that are explicitly named in their queries. |
| </para> |
| |
| <para> |
| For example: A user has a list of phone numbers where some of |
| them are private, the others are of interest for the assistant of the office. |
| The user can construct the following: |
| |
| <programlisting> |
| CREATE TABLE phone_data (person text, phone text, private boolean); |
| CREATE VIEW phone_number AS |
| SELECT person, CASE WHEN NOT private THEN phone END AS phone |
| FROM phone_data; |
| GRANT SELECT ON phone_number TO assistant; |
| </programlisting> |
| |
| Nobody except that user (and the database superusers) can access the |
| <literal>phone_data</literal> table. But because of the <command>GRANT</command>, |
| the assistant can run a <command>SELECT</command> on the |
| <literal>phone_number</literal> view. The rule system will rewrite the |
| <command>SELECT</command> from <literal>phone_number</literal> into a |
| <command>SELECT</command> from <literal>phone_data</literal>. |
| Since the user is the owner of |
| <literal>phone_number</literal> and therefore the owner of the rule, the |
| read access to <literal>phone_data</literal> is now checked against the user's |
| privileges and the query is permitted. The check for accessing |
| <literal>phone_number</literal> is also performed, but this is done |
| against the invoking user, so nobody but the user and the |
| assistant can use it. |
| </para> |
| |
| <para> |
| The privileges are checked rule by rule. So the assistant is for now the |
| only one who can see the public phone numbers. But the assistant can set up |
| another view and grant access to that to the public. Then, anyone |
| can see the <literal>phone_number</literal> data through the assistant's view. |
| What the assistant cannot do is to create a view that directly |
| accesses <literal>phone_data</literal>. (Actually the assistant can, but it will not work since |
| every access will be denied during the permission checks.) |
| And as soon as the user notices that the assistant opened |
| their <literal>phone_number</literal> view, the user can revoke the assistant's access. Immediately, any |
| access to the assistant's view would fail. |
| </para> |
| |
| <para> |
| One might think that this rule-by-rule checking is a security |
| hole, but in fact it isn't. But if it did not work this way, the assistant |
| could set up a table with the same columns as <literal>phone_number</literal> and |
| copy the data to there once per day. Then it's the assistant's own data and |
| the assistant can grant access to everyone they want. A |
| <command>GRANT</command> command means, <quote>I trust you</quote>. |
| If someone you trust does the thing above, it's time to |
| think it over and then use <command>REVOKE</command>. |
| </para> |
| |
| <para> |
| Note that while views can be used to hide the contents of certain |
| columns using the technique shown above, they cannot be used to reliably |
| conceal the data in unseen rows unless the |
| <literal>security_barrier</literal> flag has been set. For example, |
| the following view is insecure: |
| <programlisting> |
| CREATE VIEW phone_number AS |
| SELECT person, phone FROM phone_data WHERE phone NOT LIKE '412%'; |
| </programlisting> |
| This view might seem secure, since the rule system will rewrite any |
| <command>SELECT</command> from <literal>phone_number</literal> into a |
| <command>SELECT</command> from <literal>phone_data</literal> and add the |
| qualification that only entries where <literal>phone</literal> does not begin |
| with 412 are wanted. But if the user can create their own functions, |
| it is not difficult to convince the planner to execute the user-defined |
| function prior to the <function>NOT LIKE</function> expression. |
| For example: |
| <programlisting> |
| CREATE FUNCTION tricky(text, text) RETURNS bool AS $$ |
| BEGIN |
| RAISE NOTICE '% => %', $1, $2; |
| RETURN true; |
| END; |
| $$ LANGUAGE plpgsql COST 0.0000000000000000000001; |
| |
| SELECT * FROM phone_number WHERE tricky(person, phone); |
| </programlisting> |
| Every person and phone number in the <literal>phone_data</literal> table will be |
| printed as a <literal>NOTICE</literal>, because the planner will choose to |
| execute the inexpensive <function>tricky</function> function before the |
| more expensive <function>NOT LIKE</function>. Even if the user is |
| prevented from defining new functions, built-in functions can be used in |
| similar attacks. (For example, most casting functions include their |
| input values in the error messages they produce.) |
| </para> |
| |
| <para> |
| Similar considerations apply to update rules. In the examples of |
| the previous section, the owner of the tables in the example |
| database could grant the privileges <literal>SELECT</literal>, |
| <literal>INSERT</literal>, <literal>UPDATE</literal>, and <literal>DELETE</literal> on |
| the <literal>shoelace</literal> view to someone else, but only |
| <literal>SELECT</literal> on <literal>shoelace_log</literal>. The rule action to |
| write log entries will still be executed successfully, and that |
| other user could see the log entries. But they could not create fake |
| entries, nor could they manipulate or remove existing ones. In this |
| case, there is no possibility of subverting the rules by convincing |
| the planner to alter the order of operations, because the only rule |
| which references <literal>shoelace_log</literal> is an unqualified |
| <literal>INSERT</literal>. This might not be true in more complex scenarios. |
| </para> |
| |
| <para> |
| When it is necessary for a view to provide row-level security, the |
| <literal>security_barrier</literal> attribute should be applied to |
| the view. This prevents maliciously-chosen functions and operators from |
| being passed values from rows until after the view has done its work. For |
| example, if the view shown above had been created like this, it would |
| be secure: |
| <programlisting> |
| CREATE VIEW phone_number WITH (security_barrier) AS |
| SELECT person, phone FROM phone_data WHERE phone NOT LIKE '412%'; |
| </programlisting> |
| Views created with the <literal>security_barrier</literal> may perform |
| far worse than views created without this option. In general, there is |
| no way to avoid this: the fastest possible plan must be rejected |
| if it may compromise security. For this reason, this option is not |
| enabled by default. |
| </para> |
| |
| <para> |
| The query planner has more flexibility when dealing with functions that |
| have no side effects. Such functions are referred to as <literal>LEAKPROOF</literal>, and |
| include many simple, commonly used operators, such as many equality |
| operators. The query planner can safely allow such functions to be evaluated |
| at any point in the query execution process, since invoking them on rows |
| invisible to the user will not leak any information about the unseen rows. |
| Further, functions which do not take arguments or which are not passed any |
| arguments from the security barrier view do not have to be marked as |
| <literal>LEAKPROOF</literal> to be pushed down, as they never receive data |
| from the view. In contrast, a function that might throw an error depending |
| on the values received as arguments (such as one that throws an error in the |
| event of overflow or division by zero) is not leak-proof, and could provide |
| significant information about the unseen rows if applied before the security |
| view's row filters. |
| </para> |
| |
| <para> |
| It is important to understand that even a view created with the |
| <literal>security_barrier</literal> option is intended to be secure only |
| in the limited sense that the contents of the invisible tuples will not be |
| passed to possibly-insecure functions. The user may well have other means |
| of making inferences about the unseen data; for example, they can see the |
| query plan using <command>EXPLAIN</command>, or measure the run time of |
| queries against the view. A malicious attacker might be able to infer |
| something about the amount of unseen data, or even gain some information |
| about the data distribution or most common values (since these things may |
| affect the run time of the plan; or even, since they are also reflected in |
| the optimizer statistics, the choice of plan). If these types of "covert |
| channel" attacks are of concern, it is probably unwise to grant any access |
| to the data at all. |
| </para> |
| </sect1> |
| |
| <sect1 id="rules-status"> |
| <title>Rules and Command Status</title> |
| |
| <para> |
| The <productname>PostgreSQL</productname> server returns a command |
| status string, such as <literal>INSERT 149592 1</literal>, for each |
| command it receives. This is simple enough when there are no rules |
| involved, but what happens when the query is rewritten by rules? |
| </para> |
| |
| <para> |
| Rules affect the command status as follows: |
| |
| <itemizedlist> |
| <listitem> |
| <para> |
| If there is no unconditional <literal>INSTEAD</literal> rule for the query, then |
| the originally given query will be executed, and its command |
| status will be returned as usual. (But note that if there were |
| any conditional <literal>INSTEAD</literal> rules, the negation of their qualifications |
| will have been added to the original query. This might reduce the |
| number of rows it processes, and if so the reported status will |
| be affected.) |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| If there is any unconditional <literal>INSTEAD</literal> rule for the query, then |
| the original query will not be executed at all. In this case, |
| the server will return the command status for the last query |
| that was inserted by an <literal>INSTEAD</literal> rule (conditional or |
| unconditional) and is of the same command type |
| (<command>INSERT</command>, <command>UPDATE</command>, or |
| <command>DELETE</command>) as the original query. If no query |
| meeting those requirements is added by any rule, then the |
| returned command status shows the original query type and |
| zeroes for the row-count and OID fields. |
| </para> |
| </listitem> |
| </itemizedlist> |
| </para> |
| |
| <para> |
| The programmer can ensure that any desired <literal>INSTEAD</literal> rule is the one |
| that sets the command status in the second case, by giving it the |
| alphabetically last rule name among the active rules, so that it |
| gets applied last. |
| </para> |
| </sect1> |
| |
| <sect1 id="rules-triggers"> |
| <title>Rules Versus Triggers</title> |
| |
| <indexterm zone="rules-triggers"> |
| <primary>rule</primary> |
| <secondary sortas="Trigger">compared with triggers</secondary> |
| </indexterm> |
| |
| <indexterm zone="rules-triggers"> |
| <primary>trigger</primary> |
| <secondary sortas="Regeln">compared with rules</secondary> |
| </indexterm> |
| |
| <para> |
| Many things that can be done using triggers can also be |
| implemented using the <productname>PostgreSQL</productname> |
| rule system. One of the things that cannot be implemented by |
| rules are some kinds of constraints, especially foreign keys. It is possible |
| to place a qualified rule that rewrites a command to <literal>NOTHING</literal> |
| if the value of a column does not appear in another table. |
| But then the data is silently thrown away and that's |
| not a good idea. If checks for valid values are required, |
| and in the case of an invalid value an error message should |
| be generated, it must be done by a trigger. |
| </para> |
| |
| <para> |
| In this chapter, we focused on using rules to update views. All of |
| the update rule examples in this chapter can also be implemented |
| using <literal>INSTEAD OF</literal> triggers on the views. Writing such |
| triggers is often easier than writing rules, particularly if complex |
| logic is required to perform the update. |
| </para> |
| |
| <para> |
| For the things that can be implemented by both, which is best |
| depends on the usage of the database. |
| A trigger is fired once for each affected row. A rule modifies |
| the query or generates an additional query. So if many |
| rows are affected in one statement, a rule issuing one extra |
| command is likely to be faster than a trigger that is |
| called for every single row and must re-determine what to do |
| many times. However, the trigger approach is conceptually far |
| simpler than the rule approach, and is easier for novices to get right. |
| </para> |
| |
| <para> |
| Here we show an example of how the choice of rules versus triggers |
| plays out in one situation. There are two tables: |
| |
| <programlisting> |
| CREATE TABLE computer ( |
| hostname text, -- indexed |
| manufacturer text -- indexed |
| ); |
| |
| CREATE TABLE software ( |
| software text, -- indexed |
| hostname text -- indexed |
| ); |
| </programlisting> |
| |
| Both tables have many thousands of rows and the indexes on |
| <structfield>hostname</structfield> are unique. The rule or trigger should |
| implement a constraint that deletes rows from <literal>software</literal> |
| that reference a deleted computer. The trigger would use this command: |
| |
| <programlisting> |
| DELETE FROM software WHERE hostname = $1; |
| </programlisting> |
| |
| Since the trigger is called for each individual row deleted from |
| <literal>computer</literal>, it can prepare and save the plan for this |
| command and pass the <structfield>hostname</structfield> value in the |
| parameter. The rule would be written as: |
| |
| <programlisting> |
| CREATE RULE computer_del AS ON DELETE TO computer |
| DO DELETE FROM software WHERE hostname = OLD.hostname; |
| </programlisting> |
| </para> |
| |
| <para> |
| Now we look at different types of deletes. In the case of a: |
| |
| <programlisting> |
| DELETE FROM computer WHERE hostname = 'mypc.local.net'; |
| </programlisting> |
| |
| the table <literal>computer</literal> is scanned by index (fast), and the |
| command issued by the trigger would also use an index scan (also fast). |
| The extra command from the rule would be: |
| |
| <programlisting> |
| DELETE FROM software WHERE computer.hostname = 'mypc.local.net' |
| AND software.hostname = computer.hostname; |
| </programlisting> |
| |
| Since there are appropriate indexes set up, the planner |
| will create a plan of |
| |
| <literallayout class="monospaced"> |
| Nestloop |
| -> Index Scan using comp_hostidx on computer |
| -> Index Scan using soft_hostidx on software |
| </literallayout> |
| |
| So there would be not that much difference in speed between |
| the trigger and the rule implementation. |
| </para> |
| |
| <para> |
| With the next delete we want to get rid of all the 2000 computers |
| where the <structfield>hostname</structfield> starts with |
| <literal>old</literal>. There are two possible commands to do that. One |
| is: |
| |
| <programlisting> |
| DELETE FROM computer WHERE hostname >= 'old' |
| AND hostname < 'ole' |
| </programlisting> |
| |
| The command added by the rule will be: |
| |
| <programlisting> |
| DELETE FROM software WHERE computer.hostname >= 'old' AND computer.hostname < 'ole' |
| AND software.hostname = computer.hostname; |
| </programlisting> |
| |
| with the plan |
| |
| <literallayout class="monospaced"> |
| Hash Join |
| -> Seq Scan on software |
| -> Hash |
| -> Index Scan using comp_hostidx on computer |
| </literallayout> |
| |
| The other possible command is: |
| |
| <programlisting> |
| DELETE FROM computer WHERE hostname ~ '^old'; |
| </programlisting> |
| |
| which results in the following executing plan for the command |
| added by the rule: |
| |
| <literallayout class="monospaced"> |
| Nestloop |
| -> Index Scan using comp_hostidx on computer |
| -> Index Scan using soft_hostidx on software |
| </literallayout> |
| |
| This shows, that the planner does not realize that the |
| qualification for <structfield>hostname</structfield> in |
| <literal>computer</literal> could also be used for an index scan on |
| <literal>software</literal> when there are multiple qualification |
| expressions combined with <literal>AND</literal>, which is what it does |
| in the regular-expression version of the command. The trigger will |
| get invoked once for each of the 2000 old computers that have to be |
| deleted, and that will result in one index scan over |
| <literal>computer</literal> and 2000 index scans over |
| <literal>software</literal>. The rule implementation will do it with two |
| commands that use indexes. And it depends on the overall size of |
| the table <literal>software</literal> whether the rule will still be faster in the |
| sequential scan situation. 2000 command executions from the trigger over the SPI |
| manager take some time, even if all the index blocks will soon be in the cache. |
| </para> |
| |
| <para> |
| The last command we look at is: |
| |
| <programlisting> |
| DELETE FROM computer WHERE manufacturer = 'bim'; |
| </programlisting> |
| |
| Again this could result in many rows to be deleted from |
| <literal>computer</literal>. So the trigger will again run many commands |
| through the executor. The command generated by the rule will be: |
| |
| <programlisting> |
| DELETE FROM software WHERE computer.manufacturer = 'bim' |
| AND software.hostname = computer.hostname; |
| </programlisting> |
| |
| The plan for that command will again be the nested loop over two |
| index scans, only using a different index on <literal>computer</literal>: |
| |
| <programlisting> |
| Nestloop |
| -> Index Scan using comp_manufidx on computer |
| -> Index Scan using soft_hostidx on software |
| </programlisting> |
| |
| In any of these cases, the extra commands from the rule system |
| will be more or less independent from the number of affected rows |
| in a command. |
| </para> |
| |
| <para> |
| The summary is, rules will only be significantly slower than |
| triggers if their actions result in large and badly qualified |
| joins, a situation where the planner fails. |
| </para> |
| </sect1> |
| |
| </chapter> |