[svn-r2482] Bringing "HDF5 Technical Notes" into development branch (from R1.2 branch)
This commit is contained in:
271
doc/html/TechNotes/H4-H5Compat.html
Normal file
271
doc/html/TechNotes/H4-H5Compat.html
Normal file
@@ -0,0 +1,271 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Backward/Forward Compatability</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<h1>Backward/Forward Compatability</h1>
|
||||
|
||||
<p>The HDF5 development must proceed in such a manner as to
|
||||
satisfy the following conditions:
|
||||
|
||||
<ol type=A>
|
||||
<li>HDF5 applications can produce data that HDF5
|
||||
applications can read and write and HDF4 applications can produce
|
||||
data that HDF4 applications can read and write. The situation
|
||||
that demands this condition is obvious.</li>
|
||||
|
||||
<li>HDF5 applications are able to produce data that HDF4 applications
|
||||
can read and HDF4 applications can subsequently modify the
|
||||
file subject to certain constraints depending on the
|
||||
implementation. This condition is for the temporary
|
||||
situation where a consumer has neither been relinked with a new
|
||||
HDF4 API built on top of the HDF5 API nor recompiled with the
|
||||
HDF5 API.</li>
|
||||
|
||||
<li>HDF5 applications can read existing HDF4 files and subsequently
|
||||
modify the file subject to certain constraints depending on
|
||||
the implementation. This is condition is for the temporary
|
||||
situation in which the producer has neither been relinked with a
|
||||
new HDF4 API built on top of the HDF5 API nor recompiled with
|
||||
the HDF5 API, or the permanent situation of HDF5 consumers
|
||||
reading archived HDF4 files.</li>
|
||||
</ul>
|
||||
|
||||
<p>There's at least one invarient: new object features introduced
|
||||
in the HDF5 file format (like 2-d arrays of structs) might be
|
||||
impossible to "translate" to a format that an old HDF4
|
||||
application can understand either because the HDF4 file format
|
||||
or the HDF4 API has no mechanism to describe the object.
|
||||
|
||||
<p>What follows is one possible implementation based on how
|
||||
Condition B was solved in the AIO/PDB world. It also attempts
|
||||
to satisfy these goals:
|
||||
|
||||
<ol type=1>
|
||||
<li>The main HDF5 library contains as little extra baggage as
|
||||
possible by either relying on external programs to take care
|
||||
of compatability issues or by incorporating the logic of such
|
||||
programs as optional modules in the HDF5 library. Conditions B
|
||||
and C are separate programs/modules.</li>
|
||||
|
||||
<li>No extra baggage not only means the library proper is small,
|
||||
but also means it can be implemented (rather than migrated
|
||||
from HDF4 source) from the ground up with minimal regard for
|
||||
HDF4 thus keeping the logic straight forward.</li>
|
||||
|
||||
<li>Compatability issues are handled behind the scenes when
|
||||
necessary (and possible) but can be carried out explicitly
|
||||
during things like data migration.</li>
|
||||
</ol>
|
||||
|
||||
<hr>
|
||||
<h2>Wrappers</h2>
|
||||
|
||||
<p>The proposed implementation uses <i>wrappers</i> to handle
|
||||
compatability issues. A Format-X file is <i>wrapped</i> in a
|
||||
Format-Y file by creating a Format-Y skeleton that replicates
|
||||
the Format-X meta data. The Format-Y skeleton points to the raw
|
||||
data stored in Format-X without moving the raw data. The
|
||||
restriction is that raw data storage methods in Format-Y is a
|
||||
superset of raw data storage methods in Format-X (otherwise the
|
||||
raw data must be copied to Format-Y). We're assuming that meta
|
||||
data is small wrt the entire file.
|
||||
|
||||
<p>The wrapper can be a separate file that has pointers into the
|
||||
first file or it can be contained within the first file. If
|
||||
contained in a single file, the file can appear as a Format-Y
|
||||
file or simultaneously a Format-Y and Format-X file.
|
||||
|
||||
<p>The Format-X meta-data can be thought of as the original
|
||||
wrapper around raw data and Format-Y is a second wrapper around
|
||||
the same data. The wrappers are independend of one another;
|
||||
modifying the meta-data in one wrapper causes the other to
|
||||
become out of date. Modification of raw data doesn't invalidate
|
||||
either view as long as the meta data that describes its storage
|
||||
isn't modifed. For instance, an array element can change values
|
||||
if storage is already allocated for the element, but if storage
|
||||
isn't allocated then the meta data describing the storage must
|
||||
change, invalidating all wrappers but one.
|
||||
|
||||
<p>It's perfectly legal to modify the meta data of one wrapper
|
||||
without modifying the meta data in the other wrapper(s). The
|
||||
illegal part is accessing the raw data through a wrapper which
|
||||
is out of date.
|
||||
|
||||
<p>If raw data is wrapped by more than one internal wrapper
|
||||
(<i>internal</i> means that the wrapper is in the same file as
|
||||
the raw data) then access to that file must assume that
|
||||
unreferenced parts of that file contain meta data for another
|
||||
wrapper and cannot be reclaimed as free memory.
|
||||
|
||||
<hr>
|
||||
<h2>Implementation of Condition B</h2>
|
||||
|
||||
<p>Since this is a temporary situation which can't be
|
||||
automatically detected by the HDF5 library, we must rely
|
||||
on the application to notify the HDF5 library whether or not it
|
||||
must satisfy Condition B. (Even if we don't rely on the
|
||||
application, at some point someone is going to remove the
|
||||
Condition B constraint from the library.) So the module that
|
||||
handles Condition B is conditionally compiled and then enabled
|
||||
on a per-file basis.
|
||||
|
||||
<p>If the application desires to produce an HDF4 file (determined
|
||||
by arguments to <code>H5Fopen</code>), and the Condition B
|
||||
module is compiled into the library, then <code>H5Fclose</code>
|
||||
calls the module to traverse the HDF5 wrapper and generate an
|
||||
additional internal or external HDF4 wrapper (wrapper specifics
|
||||
are described below). If Condition B is implemented as a module
|
||||
then it can benefit from the metadata already cached by the main
|
||||
library.
|
||||
|
||||
<p>An internal HDF4 wrapper would be used if the HDF5 file is
|
||||
writable and the user doesn't mind that the HDF5 file is
|
||||
modified. An external wrapper would be used if the file isn't
|
||||
writable or if the user wants the data file to be primarily HDF5
|
||||
but a few applications need an HDF4 view of the data.
|
||||
|
||||
<p>Modifying through the HDF5 library an HDF5 file that has
|
||||
internal HDF4 wrapper should invalidate the HDF4 wrapper (and
|
||||
optionally regenerate it when <code>H5Fclose</code> is
|
||||
called). The HDF5 library must understand how wrappers work, but
|
||||
not necessarily anything about the HDF4 file format.
|
||||
|
||||
<p>Modifying through the HDF5 library an HDF5 file that has an
|
||||
external HDF4 wrapper will cause the HDF4 wrapper to become out
|
||||
of date (but possibly regenerated during <code>H5Fclose</code>).
|
||||
<b>Note: Perhaps the next release of the HDF4 library should
|
||||
insure that the HDF4 wrapper file has a more recent modification
|
||||
time than the raw data file (the HDF5 file) to which it
|
||||
points(?)</b>
|
||||
|
||||
<p>Modifying through the HDF4 library an HDF5 file that has an
|
||||
internal or external HDF4 wrapper will cause the HDF5 wrapper to
|
||||
become out of date. However, there is now way for the old HDF4
|
||||
library to notify the HDF5 wrapper that it's out of date.
|
||||
Therefore the HDF5 library must be able to detect when the HDF5
|
||||
wrapper is out of date and be able to fix it. If the HDF4
|
||||
wrapper is complete then the easy way is to ignore the original
|
||||
HDF5 wrapper and generate a new one from the HDF4 wrapper. The
|
||||
other approach is to compare the HDF4 and HDF5 wrappers and
|
||||
assume that if they differ HDF4 is the right one, if HDF4 omits
|
||||
data then it was because HDF4 is a partial wrapper (rather than
|
||||
assume HDF4 deleted the data), and if HDF4 has new data then
|
||||
copy the new meta data to the HDF5 wrapper. On the other hand,
|
||||
perhaps we don't need to allow these situations (modifying an
|
||||
HDF5 file with the old HDF4 library and then accessing it with
|
||||
the HDF5 library is either disallowed or causes HDF5 objects
|
||||
that can't be described by HDF4 to be lost).
|
||||
|
||||
<p>To convert an HDF5 file to an HDF4 file on demand, one simply
|
||||
opens the file with the HDF4 flag and closes it. This is also
|
||||
how AIO implemented backward compatability with PDB in its file
|
||||
format.
|
||||
|
||||
<hr>
|
||||
<h2>Implementation of Condition C</h2>
|
||||
|
||||
<p>This condition must be satisfied for all time because there
|
||||
will always be archived HDF4 files. If a pure HDF4 file (that
|
||||
is, one without HDF5 meta data) is opened with an HDF5 library,
|
||||
the <code>H5Fopen</code> builds an internal or external HDF5
|
||||
wrapper and then accesses the raw data through that wrapper. If
|
||||
the HDF5 library modifies the file then the HDF4 wrapper becomes
|
||||
out of date. However, since the HDF5 library hasn't been
|
||||
released, we can at least implement it to disable and/or reclaim
|
||||
the HDF4 wrapper.
|
||||
|
||||
<p>If an external and temporary HDF5 wrapper is desired, the
|
||||
wrapper is created through the cache like all other HDF5 files.
|
||||
The data appears on disk only if a particular cached datum is
|
||||
preempted. Instead of calling <code>H5Fclose</code> on the HDF5
|
||||
wrapper file we call <code>H5Fabort</code> which immediately
|
||||
releases all file resources without updating the file, and then
|
||||
we unlink the file from Unix.
|
||||
|
||||
<hr>
|
||||
<h2>What do wrappers look like?</h2>
|
||||
|
||||
<p>External wrappers are quite obvious: they contain only things
|
||||
from the format specs for the wrapper and nothing from the
|
||||
format specs of the format which they wrap.
|
||||
|
||||
<p>An internal HDF4 wrapper is added to an HDF5 file in such a way
|
||||
that the file appears to be both an HDF4 file and an HDF5
|
||||
file. HDF4 requires an HDF4 file header at file offset zero. If
|
||||
a user block is present then we just move the user block down a
|
||||
bit (and truncate it) and insert the minimum HDF4 signature.
|
||||
The HDF4 <code>dd</code> list and any other data it needs are
|
||||
appended to the end of the file and the HDF5 signature uses the
|
||||
logical file length field to determine the beginning of the
|
||||
trailing part of the wrapper.
|
||||
|
||||
<p>
|
||||
<center>
|
||||
<table border width="60%">
|
||||
<tr>
|
||||
<td>HDF4 minimal file header. Its main job is to point to
|
||||
the <code>dd</code> list at the end of the file.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>User-defined block which is truncated by the size of the
|
||||
HDF4 file header so that the HDF5 boot block file address
|
||||
doesn't change.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>The HDF5 boot block and data, unmodified by adding the
|
||||
HDF4 wrapper.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>The main part of the HDF4 wrapper. The <code>dd</code>
|
||||
list will have entries for all parts of the file so
|
||||
hdpack(?) doesn't (re)move anything.</td>
|
||||
</tr>
|
||||
</table>
|
||||
</center>
|
||||
|
||||
<p>When such a file is opened by the HDF5 library for
|
||||
modification it shifts the user block back down to address zero
|
||||
and fills with zeros, then truncates the file at the end of the
|
||||
HDF5 data or adds the trailing HDF4 wrapper to the free
|
||||
list. This prevents HDF4 applications from reading the file with
|
||||
an out of date wrapper.
|
||||
|
||||
<p>If there is no user block then we have a problem. The HDF5
|
||||
boot block must be moved to make room for the HDF4 file header.
|
||||
But moving just the boot block causes problems because all file
|
||||
addresses stored in the file are relative to the boot block
|
||||
address. The only option is to shift the entire file contents
|
||||
by 512 bytes to open up a user block (too bad we don't have
|
||||
hooks into the Unix i-node stuff so we could shift the entire
|
||||
file contents by the size of a file system page without ever
|
||||
performing I/O on the file :-)
|
||||
|
||||
<p>Is it possible to place an HDF5 wrapper in an HDF4 file? I
|
||||
don't know enough about the HDF4 format, but I would suspect it
|
||||
might be possible to open a hole at file address 512 (and
|
||||
possibly before) by moving some things to the end of the file
|
||||
to make room for the HDF5 signature. The remainder of the HDF5
|
||||
wrapper goes at the end of the file and entries are added to the
|
||||
HDF4 <code>dd</code> list to mark the location(s) of the HDF5
|
||||
wrapper.
|
||||
|
||||
<hr>
|
||||
<h2>Other Thoughts</h2>
|
||||
|
||||
<p>Conversion programs that copy an entire HDF4 file to a separate,
|
||||
self-contained HDF5 file and vice versa might be useful.
|
||||
|
||||
|
||||
|
||||
|
||||
<hr>
|
||||
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
|
||||
<!-- Created: Fri Oct 3 11:52:31 EST 1997 -->
|
||||
<!-- hhmts start -->
|
||||
Last modified: Wed Oct 8 12:34:42 EST 1997
|
||||
<!-- hhmts end -->
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user