What is a Multivalue Database?

What is a Multivalue Database?

Top  Previous  Next

 

There are many different databases available but they all fall into a small number of basic types. One of these is the relational database such as Oracle or Access. A relational database holds data in the form of tables in just the same way that we could store information as tables written on paper.

 

Consider an order processing system. We need to hold information about the orders that each customer has placed. Keeping things very simple, at a minimum we might need a table such as that shown below.

 

Order no

Date

Customer

Product

Quantity

1001

12 Jan 05

1728

107

4

1002

12 Jan 05

3194

318

2

1003

13 Jan 05

7532

220

1

1004

13 Jan 05

1263

318

2

 

 

In this simple table, each row represents an order and each column holds data associated with that order.

 

Relational databases are built following a set of rules known as the Laws of Normalisation [E. Codd : "A Relational Model of Data for Large Shared Data Banks", Communications of the ACM, June 1970]. The process of transforming data to fit the rules of a relational database is called normalisation and the steps in this process are referred to as first normal form, second normal form, and so on.

 

The First Law of Normalisation states that we may not have repeating data. In practical terms this means that we cannot add extra columns to the right of the table to allow a customer to order more than one item at the same time.

 

Order no

Date

Customer

Product

Quantity

Product

Quantity

1001

12 Jan 05

1728

107

4

 

 

1002

12 Jan 05

3194

318

2

452

3

1003

13 Jan 05

7532

220

1

 

 

1004

13 Jan 05

1263

318

2

 

 

 

Clearly this restriction is not acceptable in the real world.

 

There are many reasons why the Laws of Normalisation do not allow this, mostly based on the way in which the data might be stored by the computer system. If we are to observe the First Law of Normalisation, we must reconstruct our data in some way that removes the additional columns. One way would be to split an order that has multiple item across several rows of our table.

 

Order no

Date

Customer

Product

Quantity

Lines

1001-1

12 Jan 05

1728

107

4

1

1002-1

12 Jan 05

3194

318

2

2

1002-2

12 Jan 05

3194

452

3

2

1003-1

13 Jan 05

7532

220

1

1

1004-1

13 Jan 05

1263

318

2

1

 

Although we can now store as many items in an order as we wish, things have become more complicated. Firstly, the details of a single order are now split across multiple rows of our table. Secondly, we have been forced to add an extra column so that we can know how many lines there are in the order. Also, we have duplicated some information, a step which actually breaks another of the Laws of Normalisation. To avoid this last complication, a typical implementation of this sort of data in a fully normalised system (e.g. Oracle or Access) would break the order into two separate tables, one containing the basic information about the order and the other containing the details of the items ordered.

 

Order no

Date

Customer

Lines

1001

12 Jan 05

1728

1

1002

12 Jan 05

3194

2

1003

13 Jan 05

7532

1

1004

13 Jan 05

1263

1

Detail Ref

Product

Quantity

1001-1

107

4

1002-1

318

2

1002-2

452

3

1003-1

220

1

1004-1

318

2

 

Things are becoming complex and this is supposed to be a trivial application!

 

Multivalue database products avoid this complication by removing the need to adhere to the First Law of Normalisation. We allow a single cell of our table to hold more than one value (hence "multivalue").

 

Order no

Date

Customer

Product

Quantity

1001

12 Jan 05

1728

107

4

1002

12 Jan 05

3194

318

452

2

3

1003

13 Jan 05

7532

220

1

1004

13 Jan 05

1263

318

2

 

If you have spent many years working with fully normalised databases, you are probably shaking your head and saying that we cannot do this. Yes, we can do it; it's just a different way to hold our data.

 

Think about the advantages: The entire order is all held as a single record; there is no redundant duplication of data; we do not need an item counter.

 

The end result of this is that our multivalue view of the world is typically much faster than its fully normalised counterpart though there will always be situations where this model is not ideal. In such cases, you can freely revert to using the fully normalised approach. Notice that fully normalised data can be stored in a multivalue database. The opposite tends not to be true.

 

The time has come to introduce some terminology. A typical application will have many tables, perhaps hundreds or even thousands though the multivalue model usually results in far fewer tables than in other data models. Each table is stored as a file. The rows of our table are known as records and the columns as fields (some users refer to these as attributes). The data stored in a field may be made up of multiple values.

 

Note how in our multivalued implementation of the above example, the values in the product and quantity columns are related together. For any particular order, the first product number belongs with the first quantity, the second product number belongs with the second quantity and so on. A typical realistic table may have several separate sets of fields that are linked in this way. The relationship between the values in different fields (e.g. product and quantity above) is referred to as an association.

 

By adopting this data model instead of using additional columns, the data model imposes no limit to the number of items that may be included in an order.

 

This extended form of the relational database model is at the heart of the QM database. You may also see it referenced as post-relational, nested table or NF2 (non-first normal form). They all mean the same thing.

 

In a multivalue database, the tables can gain a fourth dimension (subvalues). Continuing with the above example, perhaps we need to record the serial number of each item that we sell. Thus each value line in the table depicted above would have subvalues containing the serial numbers for each item supplied. Order 1002 might become

 

Order no

Date

Customer

Product

Quantity

Serial

1002

12 Jan 05

3194

318

 

452

2

 

3

21222

21223

41272

41723

41728

 

 

The model described above leads to a very flexible design in which a database record may have any number of fields (table columns). The entire record and the constituent fields are of variable length, there being no restriction applied by QM. A record may exist in the database with no data or with many megabytes of data.

 

Every record in a data file has to have a unique record id or primary key that identifies that record as distinct from all other records in the same table and can be used to retrieve the record. In the above example, the order number would serve this purpose. Although it may be useful to show the record id as a column in the tabular representation of the data, the record id is not part of the actual database record but is instead a handle by which it is accessed. Thus the record above has id 1002 and contains five fields representing the date, customer number, product numbers, quantities and serial numbers.

 

Internally, a record is stored as a simple character representation of the data. Because the fields are of variable length, it is necessary to mark where one field ends and the next begins. This is done by placing a special marker character called a field mark between the fields. A field may be divided into values by use of value mark characters and values may be further divided by use of subvalue mark characters.

 

Record 1002 depicted above would be stored as

13527FM3194FM318VM452FM2VM3FM21222SM21223VM41272SM41723SM41728

where the FM, VM and SM items represent the mark characters.The way to read a record structure such as that shown above is to dismantle it layer by layer. First find the field marks to separate out each field, then look for the values within the fields, and finally the subvalues within the values

 

The data model defines two additional mark characters. The text mark is typically used to mark points in text data where newlines should be inserted. This mark character is often inserted by programs manipulating data in memory rather than being stored in the database. The item mark is defined mainly for compatibility with other database systems. Its only reserved use within QM is to separate items in the DATA queue.

 

When the multivalue data model was originally invented, the upper half of the ASCII character set was not defined and the last five of the unused character values were adopted for the internal representation of the mark characters:

Item markchar(255)
Field markchar(254)
Value markchar(253)
Subvalue markchar(252)
Text markchar(251)

 

The memory representation of a record containing mark characters for use in QMBasic programs is known as a dynamic array and there are many specialised program operations for working on this data.

 

Fields, values within a field and subvalues within a value are numbered from one upwards. By convention the record key is sometimes referred to as field zero though it is not strictly part of the dynamic array and references to field zero are only recognised by QM in certain contexts.

 

 

The History of Multivalue Databases

 

The original multivalue database is usually attributed to Dick Pick (hence the frequently used term "Pick databases") back in 1968 though their origins can be tracked back further. The current D3 database from Rocket Software is a direct descendant of the original Pick product but there have been many other players along the way, some large, some small. Some of these are significant to the way in which QM works.

 

The Reality database, previously implemented on McDonnell Douglas systems but now owned by Northgate Information Solutions, closely follows the Pick style of operation. The long defunct Prime Information database from Prime Computer retained the same data model and general principles but made some fairly significant changes to the command and programming languages.

 

In the mid-1980's the various companies with multivalue products hit a problem. The world was standardising on the Unix operating system but these products did not run on Unix. As a result of this, McDonnell Douglas developed an "open systems" version of Reality (Reality X) and Prime Computer developed the PI/open database. At the same time, two start up companies appeared each with their own Unix based multivalue implementation, VMark (UniVerse) and Unidata (Unidata). These companies set out to capture users from the existing products as well as taking on new users. The history is long and complex but to bring it up to date in one step, D3, mvBase, UniVerse and Unidata are now all owned by Rocket Software.

 

The UniVerse and Unidata products (usually referred to collectively as U2) follow the Information style of implementation by default but have features that allow them to look more like the Pick style if required.

 

QM was originally developed in 1993 for use as an embedded database and was released as a product in its own right in 2001. Like the U2 products, it is an Information style database but has options to make it more like Pick for those who need it.