The procedures described thus far let you define a new type, new functions and new operators. However, we cannot yet define a secondary index (such as a B-tree, R-tree or hash access method) over a new type or its operators.
Look back at The major Postgres system catalogs. The right half shows the catalogs that we must modify in order to tell Postgres how to use a user-defined type and/or user-defined operators with an index (i.e., pg_am, pg_amop, pg_amproc, pg_operator and pg_opclass). Unfortunately, there is no simple command to do this. We will demonstrate how to modify these catalogs through a running example: a new operator class for the B-tree access method that stores and sorts complex numbers in ascending absolute value order.
The pg_am class contains one instance for every user defined access method. Support for the heap access method is built into Postgres, but every other access method is described here. The schema is
Table 9-1. Index Schema
Attribute | Description |
---|---|
amname | name of the access method |
amowner | object id of the owner's instance in pg_user |
amkind | not used at present, but set to 'o' as a place holder |
amstrategies | number of strategies for this access method (see below) |
amsupport | number of support routines for this access method (see below) |
amgettuple | |
aminsert | |
... | procedure identifiers for interface routines to the access method. For example, regproc ids for opening, closing, and getting instances from the access method appear here. |
The object ID of the instance in pg_am is used as a foreign key in lots of other classes. You don't need to add a new instance to this class; all you're interested in is the object ID of the access method instance you want to extend:
SELECT oid FROM pg_am WHERE amname = 'btree'; +----+ |oid | +----+ |403 | +----+We will use that SELECT in a WHERE clause later.
The amstrategies attribute exists to standardize comparisons across data types. For example, B-trees impose a strict ordering on keys, lesser to greater. Since Postgres allows the user to define operators, Postgres cannot look at the name of an operator (eg, ">" or "<") and tell what kind of comparison it is. In fact, some access methods don't impose any ordering at all. For example, R-trees express a rectangle-containment relationship, whereas a hashed data structure expresses only bitwise similarity based on the value of a hash function. Postgres needs some consistent way of taking a qualification in your query, looking at the operator and then deciding if a usable index exists. This implies that Postgres needs to know, for example, that the "<=" and ">" operators partition a B-tree. Postgres uses strategies to express these relationships between operators and the way they can be used to scan indices.
Defining a new set of strategies is beyond the scope of this discussion, but we'll explain how B-tree strategies work because you'll need to know that to add a new operator class. In the pg_am class, the amstrategies attribute is the number of strategies defined for this access method. For B-trees, this number is 5. These strategies correspond to
Table 9-2. B-tree Strategies
Operation | Index |
---|---|
less than | 1 |
less than or equal | 2 |
equal | 3 |
greater than or equal | 4 |
greater than | 5 |
The idea is that you'll need to add procedures corresponding to the comparisons above to the pg_amop relation (see below). The access method code can use these strategy numbers, regardless of data type, to figure out how to partition the B-tree, compute selectivity, and so on. Don't worry about the details of adding procedures yet; just understand that there must be a set of these procedures for int2, int4, oid, and every other data type on which a B-tree can operate.
Sometimes, strategies aren't enough information for the system to figure out how to use an index. Some access methods require other support routines in order to work. For example, the B-tree access method must be able to compare two keys and determine whether one is greater than, equal to, or less than the other. Similarly, the R-tree access method must be able to compute intersections, unions, and sizes of rectangles. These operations do not correspond to user qualifications in SQL queries; they are administrative routines used by the access methods, internally.
In order to manage diverse support routines consistently across all Postgres access methods, pg_am includes an attribute called amsupport. This attribute records the number of support routines used by an access method. For B-trees, this number is one -- the routine to take two keys and return -1, 0, or +1, depending on whether the first key is less than, equal to, or greater than the second.
Note: Strictly speaking, this routine can return a negative number (< 0), 0, or a non-zero positive number (> 0).
The amstrategies entry in pg_am is just the number of strategies defined for the access method in question. The procedures for less than, less equal, and so on don't appear in pg_am. Similarly, amsupport is just the number of support routines required by the access method. The actual routines are listed elsewhere.
The next class of interest is pg_opclass. This class exists only to associate a name and default type with an oid. In pg_amop, every B-tree operator class has a set of procedures, one through five, above. Some existing opclasses are int2_ops, int4_ops, and oid_ops. You need to add an instance with your opclass name (for example, complex_abs_ops) to pg_opclass. The oid of this instance is a foreign key in other classes.
INSERT INTO pg_opclass (opcname, opcdeftype) SELECT 'complex_abs_ops', oid FROM pg_type WHERE typname = 'complex_abs'; SELECT oid, opcname, opcdeftype FROM pg_opclass WHERE opcname = 'complex_abs_ops'; +------+-----------------+------------+ |oid | opcname | opcdeftype | +------+-----------------+------------+ |17314 | complex_abs_ops | 29058 | +------+-----------------+------------+Note that the oid for your pg_opclass instance will be different! Don't worry about this though. We'll get this number from the system later just like we got the oid of the type here.
So now we have an access method and an operator class. We still need a set of operators; the procedure for defining operators was discussed earlier in this manual. For the complex_abs_ops operator class on Btrees, the operators we require are:
absolute value less-than absolute value less-than-or-equal absolute value equal absolute value greater-than-or-equal absolute value greater-than
Suppose the code that implements the functions defined is stored in the file PGROOT/src/tutorial/complex.c
Part of the code look like this: (note that we will only show the equality operator for the rest of the examples. The other four operators are very similar. Refer to complex.c or complex.source for the details.)
#define Mag(c) ((c)->x*(c)->x + (c)->y*(c)->y) bool complex_abs_eq(Complex *a, Complex *b) { double amag = Mag(a), bmag = Mag(b); return (amag==bmag); }
There are a couple of important things that are happening below.
First, note that operators for less-than, less-than-or equal, equal, greater-than-or-equal, and greater-than for int4 are being defined. All of these operators are already defined for int4 under the names <, <=, =, >=, and >. The new operators behave differently, of course. In order to guarantee that Postgres uses these new operators rather than the old ones, they need to be named differently from the old ones. This is a key point: you can overload operators in Postgres, but only if the operator isn't already defined for the argument types. That is, if you have < defined for (int4, int4), you can't define it again. Postgres does not check this when you define your operator, so be careful. To avoid this problem, odd names will be used for the operators. If you get this wrong, the access methods are likely to crash when you try to do scans.
The other important point is that all the operator functions return Boolean values. The access methods rely on this fact. (On the other hand, the support function returns whatever the particular access method expects -- in this case, a signed integer.) The final routine in the file is the "support routine" mentioned when we discussed the amsupport attribute of the pg_am class. We will use this later on. For now, ignore it.
CREATE FUNCTION complex_abs_eq(complex_abs, complex_abs) RETURNS bool AS 'PGROOT/tutorial/obj/complex.so' LANGUAGE 'c';
Now define the operators that use them. As noted, the operator names must be unique among all operators that take two int4 operands. In order to see if the operator names listed below are taken, we can do a query on pg_operator:
/* * this query uses the regular expression operator (~) * to find three-character operator names that end in * the character & */ SELECT * FROM pg_operator WHERE oprname ~ '^..&$'::text;
to see if your name is taken for the types you want. The important things here are the procedure (which are the C functions defined above) and the restriction and join selectivity functions. You should just use the ones used below--note that there are different such functions for the less-than, equal, and greater-than cases. These must be supplied, or the access method will crash when it tries to use the operator. You should copy the names for restrict and join, but use the procedure names you defined in the last step.
CREATE OPERATOR = ( leftarg = complex_abs, rightarg = complex_abs, procedure = complex_abs_eq, restrict = eqsel, join = eqjoinsel )
Notice that five operators corresponding to less, less equal, equal, greater, and greater equal are defined.
We're just about finished. the last thing we need to do is to update the pg_amop relation. To do this, we need the following attributes:
Table 9-3. pg_amproc Schema
Attribute | Description |
---|---|
amopid | the oid of the pg_am instance for B-tree (== 403, see above) |
amopclaid | the oid of the pg_opclass instance for complex_abs_ops (== whatever you got instead of 17314, see above) |
amopopr | the oids of the operators for the opclass (which we'll get in just a minute) |
amopselect, amopnpages | cost functions |
So we need the oids of the operators we just defined. We'll look up the names of all the operators that take two complexes, and pick ours out:
SELECT o.oid AS opoid, o.oprname INTO TABLE complex_ops_tmp FROM pg_operator o, pg_type t WHERE o.oprleft = t.oid and o.oprright = t.oid and t.typname = 'complex_abs'; +------+---------+ |oid | oprname | +------+---------+ |17321 | < | +------+---------+ |17322 | <= | +------+---------+ |17323 | = | +------+---------+ |17324 | >= | +------+---------+ |17325 | > | +------+---------+(Again, some of your oid numbers will almost certainly be different.) The operators we are interested in are those with oids 17321 through 17325. The values you get will probably be different, and you should substitute them for the values below. We will do this with a select statement.
Now we're ready to update pg_amop with our new operator class. The most important thing in this entire discussion is that the operators are ordered, from less equal through greater equal, in pg_amop. We add the instances we need:
INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy, amopselect, amopnpages) SELECT am.oid, opcl.oid, c.opoid, 1, 'btreesel'::regproc, 'btreenpage'::regproc FROM pg_am am, pg_opclass opcl, complex_abs_ops_tmp c WHERE amname = 'btree' AND opcname = 'complex_abs_ops' AND c.oprname = '<';Now do this for the other operators substituting for the "1" in the third line above and the "<" in the last line. Note the order: "less than" is 1, "less than or equal" is 2, "equal" is 3, "greater than or equal" is 4, and "greater than" is 5.
The next step is registration of the "support routine" previously described in our discussion of pg_am. The oid of this support routine is stored in the pg_amproc class, keyed by the access method oid and the operator class oid. First, we need to register the function in Postgres (recall that we put the C code that implements this routine in the bottom of the file in which we implemented the operator routines):
CREATE FUNCTION complex_abs_cmp(complex, complex) RETURNS int4 AS 'PGROOT/tutorial/obj/complex.so' LANGUAGE 'c'; SELECT oid, proname FROM pg_proc WHERE proname = 'complex_abs_cmp'; +------+-----------------+ |oid | proname | +------+-----------------+ |17328 | complex_abs_cmp | +------+-----------------+(Again, your oid number will probably be different and you should substitute the value you see for the value below.) We can add the new instance as follows:
INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) SELECT a.oid, b.oid, c.oid, 1 FROM pg_am a, pg_opclass b, pg_proc c WHERE a.amname = 'btree' AND b.opcname = 'complex_abs_ops' AND c.proname = 'complex_abs_cmp';
Now we need to add a hashing strategy to allow the type to be indexed. We do this by using another type in pg_am but we reuse the sames ops.
INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy, amopselect, amopnpages) SELECT am.oid, opcl.oid, c.opoid, 1, 'hashsel'::regproc, 'hashnpage'::regproc FROM pg_am am, pg_opclass opcl, complex_abs_ops_tmp c WHERE amname = 'hash' AND opcname = 'complex_abs_ops' AND c.oprname = '=';
In order to use this index in a where clause, we need to modify the pg_operator class as follows.
UPDATE pg_operator SET oprrest = 'eqsel'::regproc, oprjoin = 'eqjoinsel' WHERE oprname = '=' AND oprleft = oprright AND oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs'); UPDATE pg_operator SET oprrest = 'neqsel'::regproc, oprjoin = 'neqjoinsel' WHERE oprname = '' AND oprleft = oprright AND oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs'); UPDATE pg_operator SET oprrest = 'neqsel'::regproc, oprjoin = 'neqjoinsel' WHERE oprname = '' AND oprleft = oprright AND oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs'); UPDATE pg_operator SET oprrest = 'intltsel'::regproc, oprjoin = 'intltjoinsel' WHERE oprname = '<' AND oprleft = oprright AND oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs'); UPDATE pg_operator SET oprrest = 'intltsel'::regproc, oprjoin = 'intltjoinsel' WHERE oprname = '<=' AND oprleft = oprright AND oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs'); UPDATE pg_operator SET oprrest = 'intgtsel'::regproc, oprjoin = 'intgtjoinsel' WHERE oprname = '>' AND oprleft = oprright AND oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs'); UPDATE pg_operator SET oprrest = 'intgtsel'::regproc, oprjoin = 'intgtjoinsel' WHERE oprname = '>=' AND oprleft = oprright AND oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
And last (Finally!) we register a description of this type.
INSERT INTO pg_description (objoid, description) SELECT oid, 'Two part G/L account' FROM pg_type WHERE typname = 'complex_abs';