Alternate Key Indices

Alternate Key Indices

Top  Previous  Next

 

An alternate key index provides a method to access data file other than by the primary key (record id).

 

Consider a file holding information about orders with, for example, 100,000 records in it. For simplicity, assume that these records are made up of 10 orders from each of 10,000 customers. To locate all the orders placed by a specific customer would require all 100,000 records to be processed to find the 10 that we want.

 

Using an alternate key index on the customer number field of the order records, QM can look up the customer in the index and then go directly to the 10 order records. The application goes 10,000 times faster.

 

An alternate key index is created using the CREATE.INDEX command. This defines the index content but does not populate it. The BUILD.INDEX command builds the actual index and activates it. From that point forwards, QM will maintain the index automatically and the query processor will use it automatically. No changes are required to application software.

 

The functions of the CREATE.INDEX and BUILD.INDEX commands are combined in the MAKE.INDEX command.

 

Once an index has been built, it is maintained totally automatically by QM such that it is impossible to write or delete a record without the corresponding index updates being applied. It is essential that applications lock records (or the entire) file when writing or deleting in a file that uses indices. If this rule is obeyed, the only situation where it could be possible for an index not to correctly reflect the data in the file is after a power failure where some data may not have been flushed to disk. In this case, the REBUILD.ALL.INDICES command can be used to rebuild all indices in an account.

 

The query processor will also use the index totally automatically where it appears relevant. If the query processing speed suggests that it is not using an index where use was expected, the REQUIRE.INDEX keyword can be added to the query. If an index cannot be used, a diagnostic message is displayed to show the reason.

 

Indices can be built on real data stored in the file (dictionary D-type, A/S-type without a correlative, or E-type) or on calculated values (dictionary C/I-type or A/S-type with a correlative). When using calculated values, it is essential that the expression relies only on data from the file to which the index applies and is not time variant. Thus an index using data retrieved from another file using the TRANS() function, a T-conversion or a subroutine that performs a read will fail because the index will not be updated if the remote file is modified. Similarly, an index built using a calculation that uses the date or time (age calculated from a date of birth, for example) will fail because the expression does not always return the same output value for the same input.

 

These are both examples of the one and only rule that determines whether an index based on a calculated value will work: The virtual attribute expression must always return the same value when evaluated for the same input data.

 

When using a C-type index or an I-type that calls a subroutine, the index expression must not cause any write or delete operations to occur in the file to which the index applies.

 

For an index to be effective, each entry in the index should lead to a very small proportion of the data in the file. Index entries that lead to very large numbers of records are less effective and may also be costly to access or update. The worst case of this is indexing on a simple yes/no value.

 

A file may have up to 32 indices. The more indices there are, the longer it will take to update a record in the data file though this should be outweighed by the advantage of being able to use the indices in queries. Also, it should be remembered that indexing on a multivalued field may require many index entries to be updated when writing a data record.

 

Indices can be deleted using the DELETE.INDEX command if they are no longer wanted. A report of any or all indices on a file can be produced with the LIST.INDEX command.

 

A large data import or other process involving extensive updates to a file may run faster with indexing disabled. A subsequent use of BUILD.INDEX may be significantly faster than the time saved by disabling the index because it can process the data in sorted order, optimising the way in which the underlying index structures are built. To enable this mode of operation, use the DISABLE.INDEX command to turn off the indices, perform the data import or processing, and then use BUILD.INDEX to reconstruct the indices. QM does not support the ENABLE.INDEX and UPDATE.INDEX commands found in some other multivalue products as the possibility of enabling the index without reconstructing it means that the index will not reliably reflect the stored data. QM's implementation of DISABLE.INDEX allows individual indices to be disabled. Also note that while an index is disabled, it will not be used by the query processor.

 

 

Index Structure and Case Sensitivity

 

Internally, an AK is stored as a balanced tree structure in which the index keys (data field values) are in sorted order. When the index is built in case sensitive mode (the default), the sort order of the indexed data is in accordance with the ASCII character set. Thus an index might contain, in sorted order:

Lucy

MICHAEL

Mick

Peter

martin

 

For languages that use characters from the upper half of the character set (e.g. accented characters), this may not correspond to the conventional way in which the alphabet is sequenced.

 

An index can also be constructed in case insensitive form. Internally, this is performed by converting the indexed values to uppercase. Thus, the above example would be stored internally as

LUCY

MARTIN

MICHAEL

MICK

PETER

 

Notice how the sort order has also changed. Because of this, a case sensitive index cannot be used to resolve query processor selection clauses that use case insensitive operators and vice versa.

 

 

Equivalent Indices

 

An index is not identified internally by its name but rather by what it indexes. If two dictionary items have the same type, field number or expression, justification and single/multivalue status an index built on one item can be used in resolving selection criteria for the other. Thus, for example, if a file has two dictionary entries that reference the same date field but display it with a different conversion, and index built using one of the field names is equally applicable to the other. The CREATE.INDEX and MAKE.INDEX commands will not allow separate indices to be created for equivalent items.

 

 

Changing an Index

 

If the definition of the indexed data changes as a result of, for example, modifying an I-type expression that derives the indexed value, it is necessary to delete, recreate and build the index. This is because the definition of the index item is stored in the file at the point when the index is created. Subsequently modifying the dictionary will not affect the index. To simplify this process, both the CREATE.INDEX and MAKE.INDEX commands have a DELETING option that deletes any old index before creating the new index.

 

 

Use of Indices in QMBasic

 

A program can determine whether a file has indices by use of the FILEINFO() function with action key FL$AK:

IF FILEINFO(FVAR, FL$AK) THEN DISPLAY 'File has indices'

 

The names of the indices that exist for a file can be determined using the INDICES() function:

NAMES = INDICES(FVAR)

 

An extended form of the INDICES() function can be used to retrieve the details of a specific index by name:

INDEX.DATA = INDICES(FVAR, NAME)

 

Programmers can gain access to the index itself using the QMBasic SELECTINDEX statement and advanced index scanning operations can be performed using SELECTLEFT, SELECTRIGHT, SETLEFT and SETRIGHT as described below.

 

 

 
Scanning an Index

 

Because the index keys (data field values) are in sorted order, the index can be used to find records where the indexed data field value lies within a specified range of values.

 

The SELECTINDEX statement is used by QMBasic programs to build a select list of records that have a specific value in an indexed field. As a side effect, this operation leaves a pointer into the index structure at the position identified by the value referenced in the SELECTINDEX. If there are no records with the requested indexed value, the pointer is left positioned where such a value should go. Thus, if the index contained references to records with indexed values 1, 3, 5 and 7, a SELECTINDEX operation to build a list of records with an indexed value of 4 would return an empty list but would leave the index pointer between the entries for 3 and 5.

 

Alternatively, the SETLEFT or SETRIGHT statements may be used to position the pointer at the extreme left or right of the indexed values.

 

The SELECTRIGHT or SELECTLEFT statements can be used to walk through the index, starting at the position of the index pointer and moving to the right (ascending order of indexed value) or left (descending order of indexed value). This operation creates a select list of records with the next indexed value in the direction of the scan.

 

Thus, to walk through an index from beginning to end in ascending order, a program would use SETLEFT to position at the start of the data and then perform a series of SELECTRIGHT operations until no further data is available. For example, if we have an orders file open as ORD.F and there is an index named CUST.NO for the customer number, we could process orders in ascending customer number with a loop such as:

SETLEFT 'CUST.NO' FROM ORD.F

LOOP

  SELECTRIGHT 'CUST.NO' FROM ORD.F SETTING CUST

  LOOP

     READNEXT ORD.ID ELSE EXIT

     ...processing...

  REPEAT

UNTIL STATUS()

REPEAT

 

The outer loop walks through the index returning a select list of orders for each customer, also setting CUST to the corresponding customer number. The inner loop processes the orders for the customer.

 

There is a sample QMBasic class module named INDEX.CLS in the BP file of the QMSYS account, catalogued as !INDEX.CLS. This class allows an application to scan an index one record id at a time instead of one indexed value at a time. Full details of its use and internal operation can be found in the source code.

 

 

Partial Key Look-up

 

Sometimes it is useful to allow a user to type in a text item and, as each character is entered, display a list of items that start with as much as has been typed. The following code shows the principles behind a subroutine that provides this functionality.

 

SUBROUTINE LOOKUP(FVAR, INDEX.NAME, WIDTH)

* FVAR = file variable of file to search

* INDEX.NAME = name of indexed field

* WIDTH = maximum number of characters in search value

 

$INCLUDE KEYIN.H

 

  CRT @(-1) :

 

  * Loop around gathering characters for the look-up key

  KEY = ''

  LOOP

     DISPLAY @(0,22) : '[' : FMT(KEY, WIDTH:'L') : ']' :

     DISPLAY @(1+LEN(KEY), 22) :

     C = KEYCODEV()

  UNTIL C = KV$RETURN                 ; * Terminate at return key

     BEGIN CASE

        CASE C = KV$BACKSPACE         ; * Backspace

           KEY = KEY[1,LEN(KEY)-1]

        CASE C >= 32                 ; * Data character - add to key

           IF LEN(KEY) < WIDTH THEN KEY := CHAR(C)

     END CASE

 

     * Show data matching partial key

     LN = 2

     IF LEN(KEY) THEN

        SELECTINDEX INDEX.NAME, KEY FROM FVAR

        ACTUAL.KEY = KEY

        LOOP

           IF @SELECTED THEN

              DISPLAY @(0,LN) : ACTUAL.KEY

              LN += 1

           END

           SELECTRIGHT INDEX.NAME FROM FVAR SETTING ACTUAL.KEY

        UNTIL STATUS()

        WHILE ACTUAL.KEY[1, LEN(KEY)] = KEY

        REPEAT

     END

 

     * Clear any unused display lines

     DISPLAY @(0,LN) : @(-3) :

  REPEAT

  RETURN

END

 

In the code under the "Show data matching partial key" comment, the SELECTINDEX statement attempts to find a perfect match for the data in the KEY variable, building a select list of records that have that data. As a side effect the SELECTINDEX sets the scan position at the item that it found. If, as is more likely, it does not find a perfect match, an empty select list will be returned and the scan position is set to where the item would have been if it were present.

 

The loop under the SELECTINDEX then walks through the index until an item is encountered that no longer matches the entered text. On each cycle, it displays the indexed value that has been found.

 

 

Index Encryption

 

The ENCRYPT option to CREATE.INDEX or MAKE.INDEX creates an encrypted index. This is only possible when the data file being indexed uses record level encryption. The same encryption key is used for the indices and the data file. There is a significant performance penalty in using encrypted indices.

 

 

Tracking Sequential Record Ids

 

Sometimes an application may build an AK on the primary key. This can be useful where, for example, the ids are not simple sequential numbers. For this discussion, we will imagine a log file for which the record id is a timestamp value.

 

Using a left to right scan loop similar to that above would allow the application to walk through the log records. Instead of terminating the loop when we reach the end of the index data, we could use SLEEP to pause for a while and then try again. This would allow us to process new records as they are added to the log.

 

Use of SETRIGHT positions after the last record in ascending sort order. Instead of scanning the entire log from left to right, we could use SETRIGHT to position at the extreme right and then process only records added after our program starts.