Data Collections

Data Collections

Top  Previous  Next

 

A data collection is an arbitrarily multi-dimensional set of name / value pairs. Whereas the elements of a dynamic array are referenced by their numeric position, elements of a collection are referenced by case sensitive name. The value associated with each named element may be of any QM data type, including a further collection, thus allowing nesting to a depth limited only by available memory.

 

A collection can be created in three ways. The COLLECTION() function can be used to create an empty collection

CLIENT.DATA = COLLECTION()

or to make a copy of an existing collection

CLIENT.DATA = COLLECTION(OLD.CLIENT.DATA)

 

Note that a statement such as

CLIENT.DATA = OLD.CLIENT.DATA

does not copy the collection. Instead, it creates a new reference to the same collection. This is similar to copying file variables or those that reference object oriented programming objects.

         

Alternatively, a character string containing a JSON (JavaScript Object Notation) representation of the data can be parsed into a collection using the JPARSE() function

CLIENT.DATA = JPARSE(JSON.STRING)

 

 

The elements of a collection are referenced using a syntax that is very similar to use of dynamic arrays but uses curly brackets instead of angle brackets and uses names instead of numeric values. Note that in all examples of syntax relating to data collections, the curly brackets are part of the syntax and not indicative of optional elements as elsewhere in the documentation.

CLIENT.NAME = CLIENT.DATA{'NAME'}

 

In exactly the same way as with dynamic arrays, the element reference can be a literal value (as above), a variable or an expression that derives the name.

ITEM = 'NAME'

CLIENT.NAME = CLIENT.DATA{ITEM}

 

Where an element of a collection is itself a collection, the syntax becomes

LOCATION = CLIENT.DATA{'ADDRESS', 'TOWN'}

or

LOCATION = CLIENT.DATA{'ADDRESS'}{'TOWN'}

The second of these two syntaxes is valid because CLIENT.DATA{'ADDRESS'} is itself a collection and hence can have a further collection element reference applied to it.

 

A third method of referencing nested collections provides a way to access data for which the dimensionality may not have been known when the program was compiled. In this syntax, the name (the element path) is formed from multiple parts separated by forward slash characters.

LOCATION = CLIENT.DATA{'ADDRESS/TOWN'}

Although this example shows this as a literal value for clarity, this syntax would more commonly be used with an indirect reference via a name variable or expression. The three syntaxes can be mixed in any combination.

 

Referencing a collection element that does not exist returns a null string in the same way as referencing a non-existent dynamic array element, however, the STATUS() value will be set to ER$NOT.FOUND for a failed collection reference.

 

A collection may be an element of a dimensioned array

CLIENT.NAME = CLIENT.DATA(CLI.NO){'NAME'}

 

 

Enumerating Collection Elements

 

The names present in a collection may be enumerated using the ENUMERATE() function.

NAMES = ENUMERATE(CLIENT.DATA)

The names are returned as a field mark delimited dynamic array, sorted into ascending order. For a nested collection, only the level directly referenced by the argument to the function is enumerated.

 

The presence of a single element can be tested using the ELEMENT.EXISTS() function.

 

 

Adding or Modifying Collection Elements

 

An element may be added or updated in a collection using the collection reference on the left of an assignment operator

CLIENT.DATA{'PHONE'} = PHONE.NO

or using the INS statement

INS PHONE.NO AS CLIENT.DATA{'PHONE'}

If the element name references intermediate levels of a nested data collection, any absent elements are automatically inserted.

 

Elements may be copied from one collection to another (or a different name in the same collection) using a statement such as

CLIENT.DATA{'PHONE'} = OTHER.CLIENT.DATA{'PHONE'}

 

 

Deleting Collection Elements

 

An element may be deleted from a collection using the DEL statement

DEL CLIENT.DATA{'PHONE'}

If the element name references intermediate levels of a nested data collection, all lower level items are also deleted.

 

 

Arrays in Collections

 

A data collection may contain single dimensional arrays which are automatically sized to fit their content. An empty array is created using the MAT() function

CLIENT.DATA{'CONTACTS'} = MAT()

Alternatively, a standard QMBasic array variable can be copied into a collection

CLIENT.DATA{'CONTACTS'} = MAT(CONTACT.NAMES)

In the second syntax, if the array is two dimensional, it will be restructured to become a single dimensional array by copying items row by row to the new array.

 

The MAT() function has an optional second argument that sets the maximum number of elements to copy. If omitted or zero, all elements are copied.

 

 

Once the array has been created, data values are updated by using the element number as the name of the item.

CLIENT.DATA{'CONTACTS', N} = CONTACT.NAME

Use of a negative value for the array index will insert a new item on the end of the array in much the same way as use of negative values in dynamic array assignment. There may not be unused elements in an array within a collection. Thus, if the CONTACTS array holds five items, a statement such as

CLIENT.DATA{'CONTACTS', 8} = NEW.NAME

is invalid as elements 6 and 7 would be undefined, however,

CLIENT.DATA{'CONTACTS', 6} = NEW.NAME

is valid as the newly created element immediately follows the last existing element.

 

 

An element can be deleted from an array using the DEL statement

DEL CLIENT.DATA{'CONTACTS', N}

or

DEL CLIENT.DATA{'CONTACTS'}{N}

Deleting an element renumbers all elements that follow the deleted item, again just like deleting a dynamic array element.

 

 

An element can be inserted into an array using the INS statement

INS NAME AS CLIENT.DATA{'CONTACTS', N}

or

INS NAME AS CLIENT.DATA{'CONTACTS'}{N}

Inserting an element renumbers all elements that follow the inserted item. In the same way as when adding an element to the end of an array, undefined elements are not permitted. The index value identifying the insertion position must be no greater than one more than the current number of elements. A negative value appends an item to the end of the array.

 

An empty array may be inserted as an element of a collection using the INS statement

INS MAT() AS CLIENT.DATA{'CONTACTS'}

or an existing QM array can be copied into a collection using

INS MAT(NAMES) AS CLIENT.DATA{'CONTACTS'}

 

Conversely, copying an array from a collection into a standard dimensioned array uses a modified form of the MAT statement

MAT NAMES = MAT CLIENT.DATA{'CONTACTS'}

 

 

The INMAT() function can be used to find the number of elements in an array within a collection.

SIZE = INMAT(CLIENT.DATA{'CONTACTS'})

 

 

A collection reference may contain up to a maximum of two uses of an asterisk to indicate that all elements of an array are to be returned as a dynamic array

PRODUCTS = ORDER{'DETAIL/*/PRODNO'}

SERIAL = ORDER{'DETAIL/*/SERIAL.NO/*'}

CONTACT.NAMES = CLIENT.DATA{'CONTACTS/*'}

In this syntax, the data level within the collection corresponding to the position of the asterisk must be an array and the entire reference must lead to an item that can be represented as a string. Where only one asterisk is present, the dynamic array contains a value for each array element. Where two asterisks are present, the dynamic array has a value for each element of the array referenced by the first asterisk and a subvalue for each element of the array referenced by the second asterisk.

 

 

Collections and JSON

 

JSON is a way to represent arbitrarily multi-dimensional data as a character string that can be stored in a text file or transmitted over a network. It is primarily intended as a communications format for web based applications.

 

The JPARSE() function will parse a JSON string into a data collection. The JBUILD() function performs the opposite transformation, building a JSON string from a collection. It is important to understand the issues relating to data types that can occur when using these functions.

 

JSON supports numeric, string, Boolean and null data types as elements of objects and arrays.

 

Numeric data may be represented in several formats in JSON but JPARSE() will yield either an integer or floating point value that has no reference to the original data format. Using JBUILD() to reconstruct the JSON string may result in a different but valid representation of the same value.

 

Boolean values (True and False) will be correctly handled by QM through use of its internal Boolean data type. Parsing and rebuilding a JSON string will maintain the difference between True/False and their alternative numeric representation as 1/0.

 

JSON also supports the SQL concept of a null value (not the same as a null string). Support for this in QM is limited to setting and testing null values and will correctly maintain the null value through use of JPARSE() and JBUILD(). Other operations on the null value will usually result in a run time error.

 

The elements of a data collection may include data of a type that is not supported by JSON (e.g. a file variable). Although it is unlikely that these data types would be present in collections that will be converted to JSON format, the JBUILD() function will fail, returning a null string and a STATUS() value of ER$BUILD.ERROR.

 

 

Collection Files

 

A collection that has been encoded to a JSON string using JBUILD() can be stored as a record in any QM file type. If the strings within the collection do not contain field, value or subvalue marks, the JSON string can also be stored as a field, value or subvalue within a data record.

 

QM also supports data collection files which are a variant of a hashed file that stores data as a collection without the need to convert it to a JSON string. A file of this type is created by including the COLLECTION option in the CREATE.FILE command. Because the data in a collection file is not a dynamic array (although the values of any strings within the collection could be), dictionary items that reference fields by number other than the record id are invalid. Instead, dictionaries may contain E-type (element) records which have the same form as a D-type record except that field 2 holds the element path instead of a field number. The element path may include up to a maximum of two uses of an asterisk to return a multivalued list of array elements as described above.

 

Dictionary I-type items can reference E-type data definitions and can also use the {name} constructs defined above. The third argument to the TRANS() function (the item to be returned) may be an E-type item defined in the dictionary of the file being accessed if it is a collection.

 

A collection file may have indices built on data defined by an E-type dictionary record in exactly the same way as those based on D-type items in other files. It is also possible to define a trigger function in a collection file in which case the record data passed in to the trigger function will be a collection instead of a string. Only record level encryption is supported for collection files.

 

The query processor can access collection files by using E-type dictionary items instead of D-type items. The ELEMENT "name" construct can be used to reference a collection element on the command line.

 

Collection files are fully supported by QMNet, data replication, transactions and record level encryption.

 

 

Linked Collections

 

A data collection stored in a collection file may contain links to other collections in the same file or a different file. The links are stored as a string variable that contains a reference to the linked item as

filename:id

The filename (but not the colon) can be omitted if the link is to a record in a default file (perhaps the same file) but this reduced syntax has implications for the application as described below. There is nothing about this string item that defines it as a link except for how it is used by the application.

 

Where a data collection containing a link has been read into a QMBasic variable, the linked item can be joined onto its parent by use of the EXPAND() function

OK = EXPAND(VAR{'link'})

where link is the element path of the string that defines the link.

 

If the link string uses the reduced syntax that has no filename, an extended form of the EXPAND() function must be used

OK = EXPAND(VAR{'link'}, FILEVAR)

where FILEVAR is a file variable that references the file containing the linked item. This optional function argument is ignored when using the full syntax of the link string.

 

The EXPAND() function returns True if successful.

 

 

 

The Collection Editor, CED

 

The collection editor allows the content of a data collection to be viewed or edited. It may be used as a QM command to edit a record in a collection file or as a QMBasic subroutine named !CED() to edit a collection passed in as an argument. See CED for details of this editor.

 

 

 

Orphaned Self-Referential Collections

 

Because an element of a data collection can be any QM data type, including a collection, it is possible to create a self-referential data collection. In its simplest form, this could be achieved with statements of the form:

A = COLLECTION()

A{"CONTENT"} = A

 

Although this is unlikely to have any practical use, it can be done by accident. If the program went on to overwrite variable A

A = 99

the only program visible link to the collection has been lost and the entire collection becomes orphaned. This can never be repaired as there is no longer any way to access the collection.

 

 

See also:

CED, !CED(), COLLECTION(), data collection files, DEL, E-type dictionary records, ENUMERATE(), EXPAND(), INS, JBUILD(), JPARSE(), MAT()