DODS Project
About DODS
Project Mail Lists
FAQs




DODS FAQs - Kinds of Caching within DODS
dods.gif (11357 bytes)The puirpose of this document is to explain how DODS handles caching.

General Notes


Cache Transformation 
Since DODS 5.1 final, the DO cache is transformed into DataStruct cache. Instead of the whole DOs, only their original DataStructs are added to new DataStruct cache.

DO have had only one data (DataStruct object) and all transformations were done on this object. DataStruct object contains values of columns of one table row. Now, DO holds 2 DataStruct-references:

  • originalData

  • data

The originalData holds original data (that was read from the database). This is never modified till commit, and this DataStruct object is added to DataStruct cache, if this cache exists.

The second, data, is only created (by copying the first one) if data is modified. If the second DataStruct exists, the DO's attribute isDirty is set to true. Even if after some modifications the new DataStruct holds exactly the same values as the original one, the DO is still dirty. So there is no way back from isDirty=true to isDirty=false (except during commit of the transaction). If the transaction is committed, the new DataStruct is moved in the place of the original one. The new DataStruct is NULL again, so the attribute isDirty becomes false again.

A newly created DO (in memory, not from the database) will just have a DataStruct object data. Data values in DataStruct object originalData is null before the commit().

The oid and the version attributes are moved from DO to DataStruct object.


Caching Modes and Levels 
DODS provides the possibility for every table to have its cache.

The possible cache types are:

1.None
No caching is available.

2.LRU
The size of the cache is limited by the maximal number of objects that can be stored in it. When the cache is full, the objects in it are being replaced by new objects according to LRU (least recently used) algorithm. This algorithm says that the object which had been used the least recently (in the scale of time, the object to which had been accessed the longest time ago, witch is on the end of LRU list) is removed from list and new one is put in front of the LRU list. If maximal number of objects is set to 0, it means that caching is not available (None type) at the moment, and if this number is set to negative number, it means that the cache is unbounded (it has no number limit).

3.Full (special case of LRU caching)
This is cache which is unbounded. The entire table is queried and cached when the application starts. This is appropriate for tables of "static" data which are accessed frequently.

DODS has two levels of caching:

1.Data Caching level
There is only one LRU cache: cache with DataStruct objects. The keys of this cache are cache handles - Strings in the following form:

"<DataStruct_database_name>.<String_presentation_of_DataStruct_oid>"

and cache values are, as mentioned before, DataStruct objects.

2.Query caching level
Beside DataStruct object cache, there is a possibility of using three query caches (simple, complex and multi-join). Multi-join cache is included since DODS 6.0. All query caches are also LRU caches. The keys 
of these caches are Strings in the following form:

"<query_database_name>.<String_presentation_of_query>",

and cache values are Query objects. Query objects are objects of the org.enhydra.dods.cache.QueryCacheItem class.

The QueryCacheItem object stores one query and its necessary data:

  • Database of the query
  • List of oids of DataStruct object that are results of the query. This list can contain all query results, or just some of them.
  • Number of cached query results
  • Information whether all results are in result list or not
  • Information whether the query results are modified (if there have been performed inserts, updates or deletes, the results are modified)
  • Time needed for query execution
  • Array of conditions declared in WHERE part of the query (array of org.enhydra.dods.cache.Condition objects). This is needed only for simple queries.
  • Queries that are supported by DataStruct cache are simple queries. Simple query is query for which cache mechanisms can determine whether DataStruct object is query result or not. Other queries are complex queries.

The default values for maximal cache size for DataStruct simple and complex query cache are 0 (no caching).


Select, insert, update and delete clauses in DataStruct caching level

 Select clause

For query by oid (query by oid is query which "where" clause contains request for DO with specified oid), first is checked in the DataStruct cache if there is DataStruct object with desired oid. If DataStruct object is not found in the cache, hitting the database is performed, and the retrieved DataStruct object is added to the DataStruct cache.

For full caching also, for query by oid, first is checked in the DataStruct cache if there is DataStruct object with desired oid. If DataStruct object is not found in the cache, hitting the database is not performed (all rows from the table are in the cache, so there is no result of this query).

For all other queries, hitting the database is immediately performed, and the query results are added to the DataStruct cache.

 Insert clause

Data object is inserted in the database and first time the data is moved to original DataStruct, it is added to the DataStruct cache, after successful commit.

 Update clause

Data object is updated in the database and first time the data is moved to original DataStruct, it is added to the DataStruct cache if commit was successful (the old DataStruct object is removed from the DataStruct cache if it was there).

Delete clause

Deletes data object from the database and removes its original DataStruct object originalData from the DataStruct cache (if it is there).


Select, insert, update and delete clauses in Query caching level

Select clause

For query by oid (query by oid is query which "where" clause contains request for DO with specified oid), first is checked in the DataStruct cache if there is DataStruct object with desired oid. If DataStruct object is not find in the cache, hitting the database is performed, and the retrieved DataStruct object is added to the DataStruct cache. Queries by oid are not added in the query cache (they are trivial).

For full caching also, for query by oid, first is checked in the DataStruct cache if there is DataStruct object with desired oid. If DataStruct object is not find in the cache, hitting the database is not performed (all rows from the table are in the cache, so there is no result of this query).

For non-oid queries, for full caching, if the query is simple query, the query's result can be retrieved from the DataStruct cache, so there is no need to retrieve results from the database. In any other case of full caching, everything is done the same as for any other query (this is explained in the next paragraph).

For all other queries, it is checked if the query is already in the Query cache (simple, complex or multi-join). Query object has one attribute called "orderRelevant" which is true if query results must not be modified (no DO can be inserted, updated or deleted from cached query results). With the method isOrderRelevant() is checked whether the results of select can be modified or not.

If query is in the cache and the isOrderRelevant() returns false, result oids are retrieved from QueryCache. If query is in the cache and the isOrderRelevant() returns true, and the result oids are not modified, the result oids are also retrieved from query cache. But, if query is in the cache and the isOrderRelevant() returns true, but the result oids are modified, the result oids from the QueryCache are not used. Instead of that, hitting the database is performed.

If the result is found in the query cache, for every result oid, it is checked whether there is that object is in the DataStruct cache. Then, when is counted number of results that are not in the DataStruct cache, the time needed for performing queries by oid on database for all oids from the result that are not in the cache is compared against the time needed for performing the whole query.

If the time needed for performing queries by oid on database is less or equal to query execution time, results are retrieved from the cache, and those that are not there, from database (using queries by oid).

If the time is longer, or the query is not in the query cache, or the query supports joins with other tables, or cached query results are modified but for this query is order relevant, the query is performed on the database.

If the results are retrieved from the database, the query and its necessary data are put in the Query cache (simple, complex or multi-join).

If there was already that query in the query cache, but the query was executed again (because there were not enough result oids in the result list, or because the old query was modified, and for the new query isOrderRelavant is true), the old query is replaced by the new one (this query is not modified).

 Insert clause

Data object is inserted in the database and first time the data is moved to original DataStruct, it is added to the DataStruct cache, after successful commit.

All complex and multi-join queries of the table that are for the database of inserted DO, are removed from the query caches.

For every simple query of the table (with the inserted DO's database) from query cache it is checked whether inserted DO is query result or not.

If new DO is query result, in the query cache is this query marked as "modified".

If its cached results are complete (all are in the query cache), oid of this inserted DO is added to query cached result list. If cached results are not complete oid is not added to the list.

 Update clause

Data object is updated in the database and first time the data is moved to original DataStruct, it is added to the cache if commit was successful (the old DataStruct object is removed from the DataStruct cache if it was there).

All complex and multi-join queries of the table that are for the database of inserted DO are removed from the query caches.

For every simple query of the table (with the inserted DO's database) from query cache it is checked whether updated DO is the query result or not.

If yes, this query is marked as "modified" in the query cache, and the DO is included in query results only if it wasn't in the cache and the cached result list is complete.

If no, if DO's oid exists in the query results, it is removed from there and because of this change of the results, this query is marked as "modified" in the query cache.

Delete clause

Deletes DO from the database and removes its original DataStruct object originalData from the DataStruct cache (if it is there).

Goes through the query cache (simple, complex and multi-join) and wherever finds this DO, removes it from the query results and marks that query as "modified".


SQL JOIN operations
The Query classes generated by DODS perform queries on a single table. There is no way to perform a JOIN operation using Query objects. Likewise, there is no way to use DO class caches to perform JOINs.

JOINs can be performed using the QueryBuilder class.
Or, a VIEW can be created to perform the JOIN, and a DO/Query class pair created to represent the VIEW. 

For all the latest information on DODS, please refer to http://dods.enhydra.org/
Questions, comments, feedback? Let us know...