File Processing

File Processing

Top  Previous  Next

 

QMBasic programs usually need to access database files. This section discusses the various techniques available. Further information can be found by following the links to detailed sections.

 

The QM file system supports two distinct types of file:

Hashed files use a mathematical approach to locate data such that, when correctly configured, it should be possible to read any record with just one disk access regardless of the number of records in the file. For more information on the creation and configuration of these files see the section on dynamic files.

Directory files do not offer the high performance of hashed files but allow access to their data from outside of the QM environment. For this reason, they are typically used for data interchange between applications. They are also ideal for storing extremely large records. For more information on the creation of these files, see the section on directory files.

 

Directory files also allow data to be processed in a line by line manner or as a simple byte stream. There are also special program operations to simplify reading and writing comma separated data as used, for example, by some spreadsheet packages. For more information on this style of access, see sequential file i/o.

 

 

Opening Files

 

Before a file can be processed, it must be opened. This is normally done using the OPEN statement, identifying the file by referencing the name of the corresponding F-type VOC record. By having this level of indirection, the physical location of the file can be changed without affecting the application; all that is necessary is to edit the VOC record to reference the new file location. There are three special file name syntaxes available to reference files in other accounts without needing a Q-type VOC record:

Implicit Q-pointeraccount:file
Implicit QMNet pointerserver:account:file
PathnamePATH:pathname

Because these syntaxes potentially weaken the security provided by the VOC indirection, their availability is determined by a system configuration parameter, FILERULE.

 

It is also possible for an application to open files directly by pathname using the OPENPATH statement. This should only be used where the normal VOC indirection is not appropriate.

 

A typical application may open many files simultaneously and it is therefore necessary to have a way to determine which file is being referenced by subsequent data transfer operations. This link is provided by the OPEN and OPENPATH statements setting up a file variable which is then used in other operations on the same file. The file remains open so long as the file variable remains in place. Overwriting the file variable will implicitly close the file that it referenced. Exit from the program will discard local variables and hence close the file.

 

Most applications adopt a convention for the names of variables. Many examples in this documentation use a convention where the file name is contracted to three or four characters and a suffix of .F is added to form the file variable name. Thus a file named ORDERS might be referenced via a file variable named ORD.F. It is common to continue the convention into other variables so that, for example, ORDERS file record ids might be stored in ORD.ID and records could be read from the file into the ORD.REC variable. These are examples only. There is no restriction on naming imposed by QM itself.

 

A file variable may be copied, just like any other variable. In this case, the file remains open until the last file variable referencing it is discarded or overwritten.

 

There are two factors that limit the number of files that can be open at one time. Firstly, QM has an internal file table that contains a reference to every distinct file open on the system by all QM processes. The size of this table is set by the NUMFILES configuration parameter. If several users all open the same file, that only requires one entry in the table. If a program attempts to open a file when the table is full, the operation will fail, taking the ELSE clause to allow the program to report an error. The LIST.FILES command can be used to monitor how close a system is to reaching this configuration limit.

 

The second limit is imposed by the operating system. On some systems this may be configurable, on others it is fixed. QM tries to hide this limit by implementing a mechanism whereby, if the limit is reached, the file that has not been accessed for longest is closed internally to make room for the new file. Subsequent access to the file that has been closed will automatically reopen it, probably closing something else to make space. Although this mechanism is totally automatic and gives the developer the illusion that there is no limit, the impact on performance can be quite serious. It is strongly recommended that any configurable limit imposed by the operating system is set to an appropriate value.

 

Opening a file is a complex process. Although QM maintains a file cache to improve the situation, developers should avoid continually re-opening the same file. One useful way to achieve this is to place the file variable in a common block so that is not discarded when the program or subroutine exits. By using this technique it is possible for a program to open all of its main data files as it starts up and to keep them open for the entire life of the application. Keeping large numbers of files open will require careful configuration of NUMFILES and possibly the corresponding operating system parameters.

 

 

Reading, Writing and Deleting Data

 

Programs read data using the QMBasic READ statement. With a hashed file, the internal processing of this statement applies the hashing process to read just the group that would contain the requested record and then locates the record within that group. If it is not found, it is not in the file and there is no need to look elsewhere. This process ensures that hashed files give best performance. For a directory file, QM uses operating system functions to locate and read the requested item. This will not give the performance of hashed files as it requires a scan of the directory to locate the item.

 

The READ statement returns a variable that contains a dynamic array representing the data of the requested record. The program can then use the various dynamic array operations such as field extraction to access the data in the record.

 

If the record is to be updated by the application, it is essential to ensure that other processes cannot update the record at the same time. This protection is provided by QM's locking mechanisms and corresponding QMBasic statements, most importantly READU. A program should never write or delete a record unless it owns a lock to protect it. There is a configuration parameter, MUSTLOCK, that allows administrators to enforce strong locking rules. Unfortunately, this cannot be made the default behaviour as there is much software which does not use locking because the developer knew that there could never be an interaction with other processes.

 

A data record is written to the file using the QMBasic WRITE statement. If the record already exists in the file, the new version replaces the previous one. If the record does not already exist, the write operation adds it to the file. The record lock is automatically released when the write completes though there is an alternative statement, WRITEU, that retains the lock..

 

A data record is deleted  from the file using the QMBasic DELETE statement. The record lock is automatically released after the record has been deleted.

 

The QMBasic statements named above work with dynamic arrays. There is an alternative style of file i/o that uses dimensioned matrices. For details, see Matrix File I/O.

 

 

Select Lists and Alternate Key Indices

 

The READ statement requires that the program knows the id of the record it needs to read. To process a file sequentially or to process only records that meet a specific condition, programs use a select list. This list may be generated by executing a query processor SELECT operation or by use of the QMBasic SELECT statement. Whichever method is used, the program then reads items from the list using the READNEXT statement, typically in a loop that then uses READ or one of its locking variants to process each record from the list.

 

Building a select list requires the system to traverse the entire file, examining every record. For situations where only a small proportion of records are to be selected, an alternate key index can give substantial performance improvements. Effectively, this is a set of pre-built select lists based on the content of a specific field or the result of evaluating an I-type expression. The index is automatically updated whenever a change is made to the file. For well chosen indexed fields, the cost of this additional update on write is usually significantly outweighed by the performance improvement of being able to go directly to the desired set of records.

 

 

The ON ERROR Clause

 

Most of the QMBasic file handling statements have an optional ON ERROR clause. This is rarely needed by applications but allows a program to trap an error that would otherwise cause QM to abort the program. If an ON ERROR clause is present, the program can take its own recovery action or display alternative diagnostic messages. Developers should avoid using the ON ERROR clause simply to condition an ABORT statement as this will usually give less diagnostic information than would have appeared if no ON ERROR clause had been present.

 

 

Examples

 

OPEN 'CLIENTS' TO CLI.F ELSE STOP 'Cannot open CLIENTS'

LOOP

  DISPLAY 'Enter client number: ' :

  INPUT CLI.ID

UNTIL CLI.ID = ''

  READ CLI.REC FROM CLI.F, CLI.ID THEN

     DISPLAY CLI.REC<2>

  END ELSE

     DISPLAY 'Client not found'

  END

REPEAT

 

This short program opens the CLIENTS file and then enters a loop in which it prompts for a client number, reads the client record and displays the content of field 2. The loop continues until the user enters a blank client number.

 

This example shows why direct use of field numbers in programs is a bad idea. Anyone reading this program has no idea what information about the client is being displayed. A better approach is to use the EQUATE statement to define names for each field (typically in an include record). The data display statement might then become

DISPLAY CLI.REC<CLI.NAME>

which suggests to anyone reading the program that it is displaying the client's name. The GENERATE command can be used to construct an include record of field names from the file dictionary rather than having to maintain two separate descriptions of the data.

 

 

SELECT @VOC TO 1

LOOP

  READNEXT ID FROM 1 ELSE EXIT

  READ VOC.REC FROM @VOC, ID THEN

     IF VOC.REC[1,1] = 'F' THEN

        OPEN ID TO TEST.F ELSE

           DISPLAY 'File ' : ID : ' cannot be opened'

        END

     END

  END

REPEAT

 

This program uses the system defined file variable @VOC to reference the VOC instead of opening it explicitly. The SELECT statement builds a list of all records in the file into select list 1 which is then processed in the loop. For each item in the list, the record is read from the VOC and, if it is an F-type record, the program attempts to open the file. If it cannot be opened, an error message is displayed.

 

Note the use of EXIT to exit from the loop when the list is exhausted. Some multivalue environments do not support this statement and require developers to devise alternative exit schemes that are generally not as efficient.

 

Note also that the file opened to variable TEST.F is not explicitly closed. Each OPEN will implicitly close the previous file as the file variable is overwritten. The final file opened will remain open until the program terminates.

 

This program has needed to include a test to process only F-type VOC records. Alternatively, the program could use the query processor to build a list of F-type records and then process all records in this list:

 

EXECUTE "SELECT VOC WITH TYPE = 'F' TO 1"

LOOP

  READNEXT ID FROM 1 ELSE EXIT

  READ VOC.REC FROM @VOC, ID THEN

     OPEN ID TO TEST.F ELSE

        DISPLAY 'File ' : ID : ' cannot be opened'

     END

  END

REPEAT

 

Although this approach may look simpler and does not require the unwanted records to be read, it is actually less efficient than the first method as the query processor will need to read every record and the loop then re-reads the records of interest. In the previous example, use of the QMBasic SELECT actually only sets a pointer to the start of the file and the subsequent READNEXT reads each group when it needs to process the first record from the group, effectively reading the file only once. This exposes an interesting problem that is highlighted in the next example.

 

 

OPEN 'CLIENTS' TO CLI.F ELSE STOP 'Cannot open CLIENTS'

SELECT CLI.F TO 1

LOOP

  READNEXT CLI.ID FROM 1 ELSE EXIT

  READU CLI.REC FROM CLI.F, CLI.ID THEN

     RECORDLOCKU CLI.F, '0':CLI.ID

     WRITE CLI.REC TO CLI.F, '0':CLI.ID

     DELETE CLI.F, CLI.ID

     END

  END

REPEAT

 

The above program might be used to convert a CLIENTS file to add a zero on the front of each record id, perhaps to allow more clients on a system where the application requires fixed length ids. Because it is the READNEXT that actually traverses the file rather than the SELECT statement, new records written to higher numbered groups would be seen by a later READNEXT and get processed for a second time. For example, if the record for client number 1234 was in group 6 and the new version of this record with id 01234 hashed to group 10, it would appear in the list constructed when processing reaches group 10 and the record would be renamed once more to become 001234.

 

Although programs that might suffer from this problem are rare, we need to force completion of the record selection before entering the loop. One way to do this would be to use the query processor SELECT instead of the QMBasic equivalent:

EXECUTE 'SELECT CLIENTS TO 1'

 

Note the use of RECORDLOCKU to set an update lock on the record to be added to the file. Although this is probably strictly unnecessary in this example because the new record will not already exist, it does ensure compliance with the locking rules.