The puirpose of this document is to explain
how DODS handles caching.
General Notes
- Pro: Caches are faster
In most cases,
queries that find their answers in the cache will run
faster than queries that must hit the database.
- Pro: Caches can save memory
An advantage
to using caches is that, for a given row in a table,
application memory will contain only a single DO
representing that row. This is because Query objects
will always return a DO in the cache, even if the
Query object had to hit the database to put the DO in
the cache.
- Con: The "dirty write" problem
- The "dirty write" problem, cached
scenerio
If thread 1 and thread 2 have both
done a query that returned the same DO from the
cache, they could both call the same setXxx() method
on the DO, overwriting each other's changes. In
applications where this scenerio is possible, care
must be taken to control updates to DOs that are
retrieved from a cache.
- The "dirty write" problem, uncached
scenerio
A variation on the "dirty write"
problem also exists when the DO is not cached. For a
DO class that is not cached, every query constructs
new DO objects to return, even if identical DO
objects have been returned before. It is thus
possible to have 2 DO objects that represent the
same row. Calling setXxx() and save() on one of the
DOs means the other DO contains old data.
Fortunately, the version column in the first DO will
be incremented, which prevents save() from being
called on the other DO. In applications where
this scenerio is possible, care must be taken to
prevent multiple threads from trying to update the
same row through multiple DOs.
Cache Transformation
Since DODS 5.1 final, the DO cache is transformed into
DataStruct cache. Instead of the whole DOs, only their
original DataStructs are added to new DataStruct cache.
DO have had only one data (DataStruct object) and all
transformations were done on this object. DataStruct
object contains values of columns of one table row. Now,
DO holds 2 DataStruct-references:
The originalData holds original data (that was read
from the database). This is never modified till commit,
and this DataStruct object is added to DataStruct cache,
if this cache exists.
The second, data, is only created (by copying the
first one) if data is modified. If the second DataStruct
exists, the DO's attribute isDirty is set to true. Even
if after some modifications the new DataStruct holds
exactly the same values as the original one, the DO is
still dirty. So there is no way back from isDirty=true
to isDirty=false (except during commit of the
transaction). If the transaction is committed, the new
DataStruct is moved in the place of the original one.
The new DataStruct is NULL again, so the attribute
isDirty becomes false again.
A newly created DO (in memory, not from the database)
will just have a DataStruct object data. Data values in
DataStruct object originalData is null before the
commit().
The oid and the version attributes are moved from DO
to DataStruct object.
Caching Modes and Levels
DODS provides the possibility for every
table to have its cache.
The possible cache types are:
1.None
No caching is available.
2.LRU
The size of the cache is limited by the maximal
number of objects that can be stored in it. When the
cache is full, the objects in it are being replaced by
new objects according to LRU (least recently used)
algorithm. This algorithm says that the object which had
been used the least recently (in the scale of time, the
object to which had been accessed the longest time ago,
witch is on the end of LRU list) is removed from list
and new one is put in front of the LRU list. If maximal
number of objects is set to 0, it means that caching is
not available (None type) at the moment, and if this
number is set to negative number, it means that the
cache is unbounded (it has no number limit).
3.Full (special case of LRU caching)
This is cache which is unbounded. The entire table is
queried and cached when the application starts. This is
appropriate for tables of "static" data which
are accessed frequently.
DODS has two levels of caching:
1.Data Caching level
There is only one LRU cache: cache with DataStruct
objects. The keys of this cache are cache handles -
Strings in the following form:
"<DataStruct_database_name>.<String_presentation_of_DataStruct_oid>"
and cache values are, as mentioned before, DataStruct
objects.
2.Query caching level
Beside DataStruct object cache, there is a possibility of using three query caches (simple, complex and multi-join). Multi-join cache is included since DODS 6.0. All query caches are also LRU caches. The keys
of these caches are Strings in the following form:
"<query_database_name>.<String_presentation_of_query>",
and cache values are Query objects. Query objects are
objects of the org.enhydra.dods.cache.QueryCacheItem
class.
The QueryCacheItem object stores one query and its
necessary data:
- Database of the query
- List of oids of DataStruct object that are results of
the query. This list can contain all query results,
or just some of them.
- Number
of cached query results
- Information
whether all results are in result list or not
- Information
whether the query results are modified (if there
have been performed inserts, updates or deletes, the
results are modified)
- Time
needed for query execution
- Array
of conditions declared in WHERE part of the query
(array of org.enhydra.dods.cache.Condition
objects). This is needed only for simple queries.
- Queries
that are supported by DataStruct cache are simple
queries. Simple query is query for which
cache mechanisms can determine whether DataStruct
object is query result or not. Other queries are complex
queries.
The
default values for maximal cache size for DataStruct
simple and complex query cache are 0 (no caching).
Select, insert, update and delete clauses in
DataStruct caching level
Select clause
For query by oid (query by oid is query which
"where" clause contains request for DO with
specified oid), first is checked in the DataStruct cache
if there is DataStruct object with desired oid. If
DataStruct object is not found in the cache, hitting the
database is performed, and the retrieved DataStruct
object is added to the DataStruct cache.
For full caching also, for query by oid, first is
checked in the DataStruct cache if there is DataStruct
object with desired oid. If DataStruct object is not
found in the cache, hitting the database is not
performed (all rows from the table are in the cache, so
there is no result of this query).
For all other queries, hitting the database is
immediately performed, and the query results are added
to the DataStruct cache.
Insert clause
Data
object is inserted in the database and first time the
data is moved to original DataStruct, it is added to the
DataStruct cache, after successful commit.
Update clause
Data
object is updated in the database and first time the
data is moved to original DataStruct, it is added to the
DataStruct cache if commit was successful (the old
DataStruct object is removed from the DataStruct cache
if it was there).
Delete clause
Deletes data object from
the database and removes its original DataStruct object
originalData from the DataStruct cache (if it is there).
Select, insert, update and delete clauses in
Query caching level
Select clause
For query by oid (query by oid is query which
"where" clause contains request for DO with
specified oid), first is checked in the DataStruct cache
if there is DataStruct object with desired oid. If
DataStruct object is not find in the cache, hitting the
database is performed, and the retrieved DataStruct
object is added to the DataStruct cache. Queries by oid
are not added in the query cache (they are trivial).
For full caching also, for query by oid, first is
checked in the DataStruct cache if there is DataStruct
object with desired oid. If DataStruct object is not
find in the cache, hitting the database is not performed
(all rows from the table are in the cache, so there is
no result of this query).
For non-oid queries, for full caching, if the query
is simple query, the query's result can be retrieved
from the DataStruct cache, so there is no need to
retrieve results from the database. In any other case of
full caching, everything is done the same as for any
other query (this is explained in the next paragraph).
For all other queries, it is checked if the query is
already in the Query cache (simple, complex or
multi-join). Query object has one attribute called
"orderRelevant" which is true if query results
must not be modified (no DO can be inserted, updated or
deleted from cached query results). With the method
isOrderRelevant() is checked whether the results of
select can be modified or not.
If query is in the cache and the isOrderRelevant()
returns false, result oids are retrieved from
QueryCache. If query is in the cache and the
isOrderRelevant() returns true, and the result oids are
not modified, the result oids are also retrieved from
query cache. But, if query is in the cache and the
isOrderRelevant() returns true, but the result oids are
modified, the result oids from the QueryCache are not
used. Instead of that, hitting the database is
performed.
If the result is found in the query cache, for every
result oid, it is checked whether there is that object
is in the DataStruct cache. Then, when is counted number
of results that are not in the DataStruct cache, the
time needed for performing queries by oid on database
for all oids from the result that are not in the cache
is compared against the time needed for performing the
whole query.
If the time needed for performing queries by oid on
database is less or equal to query execution time,
results are retrieved from the cache, and those that are
not there, from database (using queries by oid).
If the time is longer, or the query is not in the
query cache, or the query supports joins with other
tables, or cached query results are modified but for
this query is order relevant, the query is performed on
the database.
If the results are retrieved from the database, the
query and its necessary data are put in the Query cache
(simple, complex or multi-join).
If there was already that query in the query cache,
but the query was executed again (because there were not
enough result oids in the result list, or because the
old query was modified, and for the new query
isOrderRelavant is true), the old query is replaced by
the new one (this query is not modified).
Insert clause
Data object is inserted in the database and first
time the data is moved to original DataStruct, it is
added to the DataStruct cache, after successful commit.
All complex and multi-join queries of the table that
are for the database of inserted DO, are removed from
the query caches.
For every simple query of the table (with the
inserted DO's database) from query cache it is checked
whether inserted DO is query result or not.
If new DO is query result, in the query cache is this
query marked as "modified".
If its cached results are complete (all are in the
query cache), oid of this inserted DO is added to query
cached result list. If cached results are not complete
oid is not added to the list.
Update clause
Data object is updated in the database and first time
the data is moved to original DataStruct, it is added to
the cache if commit was successful (the old DataStruct
object is removed from the DataStruct cache if it was
there).
All complex and multi-join queries of the table that
are for the database of inserted DO are removed from the
query caches.
For every simple query of the table (with the
inserted DO's database) from query cache it is checked
whether updated DO is the query result or not.
If yes, this query is marked as "modified"
in the query cache, and the DO is included in query
results only if it wasn't in the cache and the cached
result list is complete.
If no, if DO's oid exists in the query results, it is
removed from there and because of this change of the
results, this query is marked as "modified" in
the query cache.
Delete clause
Deletes DO from the database and removes its original
DataStruct object originalData from the DataStruct cache
(if it is there).
Goes through the query cache (simple, complex and
multi-join) and wherever finds this DO, removes it from
the query results and marks that query as
"modified".
SQL JOIN operations The
Query classes generated by DODS perform queries on a
single table. There is no way to perform a JOIN
operation using Query objects. Likewise, there is no way
to use DO class caches to perform JOINs.
JOINs can be performed using the QueryBuilder
class. Or, a VIEW can be created to perform the JOIN,
and a DO/Query class pair created to represent the VIEW.
|