[svn-r2482] Bringing "HDF5 Technical Notes" into development branch (from R1.2 branch)
This commit is contained in:
279
doc/html/TechNotes/ExternalFiles.html
Normal file
279
doc/html/TechNotes/ExternalFiles.html
Normal file
@@ -0,0 +1,279 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>External Files in HDF5</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center><h1>External Files in HDF5</h1></center>
|
||||
|
||||
<h3>Overview of Layers</h3>
|
||||
|
||||
<p>This table shows some of the layers of HDF5. Each layer calls
|
||||
functions at the same or lower layers and never functions at
|
||||
higher layers. An object identifier (OID) takes various forms
|
||||
at the various layers: at layer 0 an OID is an absolute physical
|
||||
file address; at layers 1 and 2 it's an absolute virtual file
|
||||
address. At layers 3 through 6 it's a relative address, and at
|
||||
layers 7 and above it's an object handle.
|
||||
|
||||
<p><center>
|
||||
<table border cellpadding=4 width="60%">
|
||||
<tr align=center>
|
||||
<td>Layer-7</td>
|
||||
<td>Groups</td>
|
||||
<td>Datasets</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td>Layer-6</td>
|
||||
<td>Indirect Storage</td>
|
||||
<td>Symbol Tables</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td>Layer-5</td>
|
||||
<td>B-trees</td>
|
||||
<td>Object Hdrs</td>
|
||||
<td>Heaps</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td>Layer-4</td>
|
||||
<td>Caching</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td>Layer-3</td>
|
||||
<td>H5F chunk I/O</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td>Layer-2</td>
|
||||
<td>H5F low</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td>Layer-1</td>
|
||||
<td>File Family</td>
|
||||
<td>Split Meta/Raw</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td>Layer-0</td>
|
||||
<td>Section-2 I/O</td>
|
||||
<td>Standard I/O</td>
|
||||
<td>Malloc/Free</td>
|
||||
</tr>
|
||||
</table>
|
||||
</center>
|
||||
|
||||
<h3>Single Address Space</h3>
|
||||
|
||||
<p>The simplest form of hdf5 file is a single file containing only
|
||||
hdf5 data. The file begins with the boot block, which is
|
||||
followed until the end of the file by hdf5 data. The next most
|
||||
complicated file allows non-hdf5 data (user defined data or
|
||||
internal wrappers) to appear before the boot block and after the
|
||||
end of the hdf5 data. The hdf5 data is treated as a single
|
||||
linear address space in both cases.
|
||||
|
||||
<p>The next level of complexity comes when non-hdf5 data is
|
||||
interspersed with the hdf5 data. We handle that by including
|
||||
the non-hdf5 interspersed data in the hdf5 address space and
|
||||
simply not referencing it (eventually we might add those
|
||||
addresses to a "do-not-disturb" list using the same mechanism as
|
||||
the hdf5 free list, but it's not absolutely necessary). This is
|
||||
implemented except for the "do-not-disturb" list.
|
||||
|
||||
<p>The most complicated single address space hdf5 file is when we
|
||||
allow the address space to be split among multiple physical
|
||||
files. For instance, a >2GB file can be split into smaller
|
||||
chunks and transfered to a 32 bit machine, then accessed as a
|
||||
single logical hdf5 file. The library already supports >32 bit
|
||||
addresses, so at layer 1 we split a 64-bit address into a 32-bit
|
||||
file number and a 32-bit offset (the 64 and 32 are
|
||||
arbitrary). The rest of the library still operates with a linear
|
||||
address space.
|
||||
|
||||
<p>Another variation might be a family of two files where all the
|
||||
meta data is stored in one file and all the raw data is stored
|
||||
in another file to allow the HDF5 wrapper to be easily replaced
|
||||
with some other wrapper.
|
||||
|
||||
<p>The <code>H5Fcreate</code> and <code>H5Fopen</code> functions
|
||||
would need to be modified to pass file-type info down to layer 2
|
||||
so the correct drivers can be called and parameters passed to
|
||||
the drivers to initialize them.
|
||||
|
||||
<h4>Implementation</h4>
|
||||
|
||||
<p>I've implemented fixed-size family members. The entire hdf5
|
||||
file is partitioned into members where each member is the same
|
||||
size. The family scheme is used if one passes a name to
|
||||
<code>H5F_open</code> (which is called by <code>H5Fopen()</code>
|
||||
and <code>H5Fcreate</code>) that contains a
|
||||
<code>printf(3c)</code>-style integer format specifier.
|
||||
Currently, the default low-level file driver is used for all
|
||||
family members (H5F_LOW_DFLT, usually set to be Section 2 I/O or
|
||||
Section 3 stdio), but we'll probably eventually want to pass
|
||||
that as a parameter of the file access property list, which
|
||||
hasn't been implemented yet. When creating a family, a default
|
||||
family member size is used (defined at the top H5Ffamily.c,
|
||||
currently 64MB) but that also should be settable in the file
|
||||
access property list. When opening an existing family, the size
|
||||
of the first member is used to determine the member size
|
||||
(flushing/closing a family ensures that the first member is the
|
||||
correct size) but the other family members don't have to be that
|
||||
large (the local address space, however, is logically the same
|
||||
size for all members).
|
||||
|
||||
<p>I haven't implemented a split meta/raw family yet but am rather
|
||||
curious to see how it would perform. I was planning to use the
|
||||
`.h5' extension for the meta data file and `.raw' for the raw
|
||||
data file. The high-order bit in the address would determine
|
||||
whether the address refers to meta data or raw data. If the user
|
||||
passes a name that ends with `.raw' to <code>H5F_open</code>
|
||||
then we'll chose the split family and use the default low level
|
||||
driver for each of the two family members. Eventually we'll
|
||||
want to pass these kinds of things through the file access
|
||||
property list instead of relying on naming convention.
|
||||
|
||||
<h3>External Raw Data</h3>
|
||||
|
||||
<p>We also need the ability to point to raw data that isn't in the
|
||||
HDF5 linear address space. For instance, a dataset might be
|
||||
striped across several raw data files.
|
||||
|
||||
<p>Fortunately, the only two packages that need to be aware of
|
||||
this are the packages for reading/writing contiguous raw data
|
||||
and discontiguous raw data. Since contiguous raw data is a
|
||||
special case, I'll discuss how to implement external raw data in
|
||||
the discontiguous case.
|
||||
|
||||
<p>Discontiguous data is stored as a B-tree whose keys are the
|
||||
chunk indices and whose leaf nodes point to the raw data by
|
||||
storing a file address. So what we need is some way to name the
|
||||
external files, and a way to efficiently store the external file
|
||||
name for each chunk.
|
||||
|
||||
<p>I propose adding to the object header an <em>External File
|
||||
List</em> message that is a 1-origin array of file names.
|
||||
Then, in the B-tree, each key has an index into the External
|
||||
File List (or zero for the HDF5 file) for the file where the
|
||||
chunk can be found. The external file index is only used at
|
||||
the leaf nodes to get to the raw data (the entire B-tree is in
|
||||
the HDF5 file) but because of the way keys are copied among
|
||||
the B-tree nodes, it's much easier to store the index with
|
||||
every key.
|
||||
|
||||
<h3>Multiple HDF5 Files</h3>
|
||||
|
||||
<p>One might also want to combine two or more HDF5 files in a
|
||||
manner similar to mounting file systems in Unix. That is, the
|
||||
group structure and meta data from one file appear as though
|
||||
they exist in the first file. One opens File-A, and then
|
||||
<em>mounts</em> File-B at some point in File-A, the <em>mount
|
||||
point</em>, so that traversing into the mount point actually
|
||||
causes one to enter the root object of File-B. File-A and
|
||||
File-B are each complete HDF5 files and can be accessed
|
||||
individually without mounting them.
|
||||
|
||||
<p>We need a couple additional pieces of machinery to make this
|
||||
work. First, an haddr_t type (a file address) doesn't contain
|
||||
any info about which HDF5 file's address space the address
|
||||
belongs to. But since haddr_t is an opaque type except at
|
||||
layers 2 and below, it should be quite easy to add a pointer to
|
||||
the HDF5 file. This would also remove the H5F_t argument from
|
||||
most of the low-level functions since it would be part of the
|
||||
OID.
|
||||
|
||||
<p>The other thing we need is a table of mount points and some
|
||||
functions that understand them. We would add the following
|
||||
table to each H5F_t struct:
|
||||
|
||||
<p><code><pre>
|
||||
struct H5F_mount_t {
|
||||
H5F_t *parent; /* Parent HDF5 file if any */
|
||||
struct {
|
||||
H5F_t *f; /* File which is mounted */
|
||||
haddr_t where; /* Address of mount point */
|
||||
} *mount; /* Array sorted by mount point */
|
||||
intn nmounts; /* Number of mounted files */
|
||||
intn alloc; /* Size of mount table */
|
||||
}
|
||||
</pre></code>
|
||||
|
||||
<p>The <code>H5Fmount</code> function takes the ID of an open
|
||||
file or group, the name of a to-be-mounted file, the name of the mount
|
||||
point, and a file access property list (like <code>H5Fopen</code>).
|
||||
It opens the new file and adds a record to the parent's mount
|
||||
table. The <code>H5Funmount</code> function takes the parent
|
||||
file or group ID and the name of the mount point and disassociates
|
||||
the mounted file from the mount point. It does not close the
|
||||
mounted file. The <code>H5Fclose</code>
|
||||
function closes/unmounts files recursively.
|
||||
|
||||
<p>The <code>H5G_iname</code> function which translates a name to
|
||||
a file address (<code>haddr_t</code>) looks at the mount table
|
||||
at each step in the translation and switches files where
|
||||
appropriate. All name-to-address translations occur through
|
||||
this function.
|
||||
|
||||
<h3>How Long?</h3>
|
||||
|
||||
<p>I'm expecting to be able to implement the two new flavors of
|
||||
single linear address space in about two days. It took two hours
|
||||
to implement the malloc/free file driver at level zero and I
|
||||
don't expect this to be much more work.
|
||||
|
||||
<p>I'm expecting three days to implement the external raw data for
|
||||
discontiguous arrays. Adding the file index to the B-tree is
|
||||
quite trivial; adding the external file list message shouldn't
|
||||
be too hard since the object header message class from wich this
|
||||
message derives is fully implemented; and changing
|
||||
<code>H5F_istore_read</code> should be trivial. Most of the
|
||||
time will be spent designing a way to cache Unix file
|
||||
descriptors efficiently since the total number open files
|
||||
allowed per process could be much smaller than the total number
|
||||
of HDF5 files and external raw data files.
|
||||
|
||||
<p>I'm expecting four days to implement being able to mount one
|
||||
HDF5 file on another. I was originally planning a lot more, but
|
||||
making <code>haddr_t</code> opaque turned out to be much easier
|
||||
than I planned (I did it last Fri). Most of the work will
|
||||
probably be removing the redundant H5F_t arguments for lots of
|
||||
functions.
|
||||
|
||||
<h3>Conclusion</h3>
|
||||
|
||||
<p>The external raw data could be implemented as a single linear
|
||||
address space, but doing so would require one to allocate large
|
||||
enough file addresses throughout the file (>32bits) before the
|
||||
file was created. It would make mixing an HDF5 file family with
|
||||
external raw data, or external HDF5 wrapper around an HDF4 file
|
||||
a more difficult process. So I consider the implementation of
|
||||
external raw data files as a single HDF5 linear address space a
|
||||
kludge.
|
||||
|
||||
<p>The ability to mount one HDF5 file on another might not be a
|
||||
very important feature especially since each HDF5 file must be a
|
||||
complete file by itself. It's not possible to stripe an array
|
||||
over multiple HDF5 files because the B-tree wouldn't be complete
|
||||
in any one file, so the only choice is to stripe the array
|
||||
across multiple raw data files and store the B-tree in the HDF5
|
||||
file. On the other hand, it might be useful if one file
|
||||
contains some public data which can be mounted by other files
|
||||
(e.g., a mesh topology shared among collaborators and mounted by
|
||||
files that contain other fields defined on the mesh). Of course
|
||||
the applications can open the two files separately, but it might
|
||||
be more portable if we support it in the library.
|
||||
|
||||
<p>So we're looking at about two weeks to implement all three
|
||||
versions. I didn't get a chance to do any of them in AIO
|
||||
although we had long-term plans for the first two with a
|
||||
possibility of the third. They'll be much easier to implement in
|
||||
HDF5 than AIO since I've been keeping these in mind from the
|
||||
start.
|
||||
|
||||
<hr>
|
||||
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
|
||||
<!-- Created: Sat Nov 8 18:08:52 EST 1997 -->
|
||||
<!-- hhmts start -->
|
||||
Last modified: Tue Sep 8 14:43:32 EDT 1998
|
||||
<!-- hhmts end -->
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user