[svn-r2208] Big.html --> BigDataSmMach.html
Coding.html --> NamingScheme.html CodeReview.html ExternalFiles.html compat.html --> H4-H5Compat.html heap.txt --> HeapMgmt.html IOPipe.html Lib_Maint.html --> LibMaint.html MemoryManagement.html move.html --> MoveDStruct.html ObjectHeader.txt storage.html --> RawDStorage.html symtab --> SymbolTables.html Version.html Above files moved from doc/html/ to doc/html/TechNotes/ for into new "HDF5 Technical Notes" document. Filenames changed as indicated.
This commit is contained in:
274
doc/html/TechNotes/RawDStorage.html
Normal file
274
doc/html/TechNotes/RawDStorage.html
Normal file
@@ -0,0 +1,274 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Raw Data Storage in HDF5</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<h1>Raw Data Storage in HDF5</h1>
|
||||
|
||||
<p>This document describes the various ways that raw data is
|
||||
stored in an HDF5 file and the object header messages which
|
||||
contain the parameters for the storage.
|
||||
|
||||
<p>Raw data storage has three components: the mapping from some
|
||||
logical multi-dimensional element space to the linear address
|
||||
space of a file, compression of the raw data on disk, and
|
||||
striping of raw data across multiple files. These components
|
||||
are orthogonal.
|
||||
|
||||
<p>Some goals of the storage mechanism are to be able to
|
||||
efficently store data which is:
|
||||
|
||||
<dl>
|
||||
<dt>Small
|
||||
<dd>Small pieces of raw data can be treated as meta data and
|
||||
stored in the object header. This will be achieved by storing
|
||||
the raw data in the object header with message 0x0006.
|
||||
Compression and striping are not supported in this case.
|
||||
|
||||
<dt>Complete Large
|
||||
<dd>The library should be able to store large arrays
|
||||
contiguously in the file provided the user knows the final
|
||||
array size a priori. The array can then be read/written in a
|
||||
single I/O request. This is accomplished by describing the
|
||||
storage with object header message 0x0005. Compression and
|
||||
striping are not supported in this case.
|
||||
|
||||
<dt>Sparse Large
|
||||
<dd>A large sparse raw data array should be stored in a manner
|
||||
that is space-efficient but one in which any element can still
|
||||
be accessed in a reasonable amount of time. Implementation
|
||||
details are below.
|
||||
|
||||
<dt>Dynamic Size
|
||||
<dd>One often doesn't have prior knowledge of the size of an
|
||||
array. It would be nice to allow arrays to grow dynamically in
|
||||
any dimension. It might also be nice to allow the array to
|
||||
grow in the negative dimension directions if convenient to
|
||||
implement. Implementation details are below.
|
||||
|
||||
<dt>Subslab Access
|
||||
<dd>Some multi-dimensional arrays are almost always accessed by
|
||||
subslabs. For instance, a 2-d array of pixels might always be
|
||||
accessed as smaller 1k-by-1k 2-d arrays always aligned on 1k
|
||||
index values. We should be able to store the array in such a
|
||||
way that striding though the entire array is not necessary.
|
||||
Subslab access might also be useful with compression
|
||||
algorithms where each storage slab can be compressed
|
||||
independently of the others. Implementation details are below.
|
||||
|
||||
<dt>Compressed
|
||||
<dd>Various compression algorithms can be applied to the entire
|
||||
array. We're not planning to support separate algorithms (or a
|
||||
single algorithm with separate parameters) for each chunk
|
||||
although it would be possible to implement that in a manner
|
||||
similar to the way striping across files is
|
||||
implemented.
|
||||
|
||||
<dt>Striped Across Files
|
||||
<dd>The array access functions should support arrays stored
|
||||
discontiguously across a set of files.
|
||||
</dl>
|
||||
|
||||
<h1>Implementation of Indexed Storage</h1>
|
||||
|
||||
<p>The Sparse Large, Dynamic Size, and Subslab Access methods
|
||||
share so much code that they can be described with a single
|
||||
message. The new Indexed Storage Message (<code>0x0008</code>)
|
||||
will replace the old Chunked Object (<code>0x0009</code>) and
|
||||
Sparse Object (<code>0x000A</code>) Messages.
|
||||
|
||||
<p>
|
||||
<center>
|
||||
<table border cellpadding=4 width="60%">
|
||||
<caption align=bottom>
|
||||
<b>The Format of the Indexed Storage Message</b>
|
||||
</caption>
|
||||
<tr align=center>
|
||||
<th width="25%">byte</th>
|
||||
<th width="25%">byte</th>
|
||||
<th width="25%">byte</th>
|
||||
<th width="25%">byte</th>
|
||||
</tr>
|
||||
|
||||
<tr align=center>
|
||||
<td colspan=4><br>Address of B-tree<br><br></td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td>Number of Dimensions</td>
|
||||
<td>Reserved</td>
|
||||
<td>Reserved</td>
|
||||
<td>Reserved</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>Reserved (4 bytes)</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>Alignment for Dimension 0 (4 bytes)</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>Alignment for Dimension 1 (4 bytes)</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>...</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>Alignment for Dimension N (4 bytes)</td>
|
||||
</tr>
|
||||
</table>
|
||||
</center>
|
||||
|
||||
<p>The alignment fields indicate the alignment in logical space to
|
||||
use when allocating new storage areas on disk. For instance,
|
||||
writing every other element of a 100-element one-dimensional
|
||||
array (using one HDF5 I/O partial write operation per element)
|
||||
that has unit storage alignment would result in 50
|
||||
single-element, discontiguous storage segments. However, using
|
||||
an alignment of 25 would result in only four discontiguous
|
||||
segments. The size of the message varies with the number of
|
||||
dimensions.
|
||||
|
||||
<p>A B-tree is used to point to the discontiguous portions of
|
||||
storage which has been allocated for the object. All keys of a
|
||||
particular B-tree are the same size and are a function of the
|
||||
number of dimensions. It is therefore not possible to change the
|
||||
dimensionality of an indexed storage array after its B-tree is
|
||||
created.
|
||||
|
||||
<p>
|
||||
<center>
|
||||
<table border cellpadding=4 width="60%">
|
||||
<caption align=bottom>
|
||||
<b>The Format of a B-Tree Key</b>
|
||||
</caption>
|
||||
<tr align=center>
|
||||
<th width="25%">byte</th>
|
||||
<th width="25%">byte</th>
|
||||
<th width="25%">byte</th>
|
||||
<th width="25%">byte</th>
|
||||
</tr>
|
||||
|
||||
<tr align=center>
|
||||
<td colspan=4>External File Number or Zero (4 bytes)</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>Chunk Offset in Dimension 0 (4 bytes)</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>Chunk Offset in Dimension 1 (4 bytes)</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>...</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>Chunk Offset in Dimension N (4 bytes)</td>
|
||||
</tr>
|
||||
</table>
|
||||
</center>
|
||||
|
||||
<p>The keys within a B-tree obey an ordering based on the chunk
|
||||
offsets. If the offsets in dimension-0 are equal, then
|
||||
dimension-1 is used, etc. The External File Number field
|
||||
contains a 1-origin offset into the External File List message
|
||||
which contains the name of the external file in which that chunk
|
||||
is stored.
|
||||
|
||||
<h1>Implementation of Striping</h1>
|
||||
|
||||
<p>The indexed storage will support arbitrary striping at the
|
||||
chunk level; each chunk can be stored in any file. This is
|
||||
accomplished by using the External File Number field of an
|
||||
indexed storage B-tree key as a 1-origin offset into an External
|
||||
File List Message (0x0009) which takes the form:
|
||||
|
||||
<p>
|
||||
<center>
|
||||
<table border cellpadding=4 width="60%">
|
||||
<caption align=bottom>
|
||||
<b>The Format of the External File List Message</b>
|
||||
</caption>
|
||||
<tr align=center>
|
||||
<th width="25%">byte</th>
|
||||
<th width="25%">byte</th>
|
||||
<th width="25%">byte</th>
|
||||
<th width="25%">byte</th>
|
||||
</tr>
|
||||
|
||||
<tr align=center>
|
||||
<td colspan=4><br>Name Heap Address<br><br></td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>Number of Slots Allocated (4 bytes)</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>Number of File Names (4 bytes)</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>Byte Offset of Name 1 in Heap (4 bytes)</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>Byte Offset of Name 2 in Heap (4 bytes)</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4>...</td>
|
||||
</tr>
|
||||
<tr align=center>
|
||||
<td colspan=4><br>Unused Slot(s)<br><br></td>
|
||||
</tr>
|
||||
</table>
|
||||
</center>
|
||||
|
||||
<p>Each indexed storage array that has all or part of its data
|
||||
stored in external files will contain a single external file
|
||||
list message. The size of the messages is determined when the
|
||||
message is created, but it may be possible to enlarge the
|
||||
message on demand by moving it. At this time, it's not possible
|
||||
for multiple arrays to share a single external file list
|
||||
message.
|
||||
|
||||
<dl>
|
||||
<dt><code>
|
||||
H5O_efl_t *H5O_efl_new (H5G_entry_t *object, intn
|
||||
nslots_hint, intn heap_size_hint)
|
||||
</code>
|
||||
<dd>Adds a new, empty external file list message to an object
|
||||
header and returns a pointer to that message. The message
|
||||
acts as a cache for file descriptors of external files that
|
||||
are open.
|
||||
|
||||
<p><dt><code>
|
||||
intn H5O_efl_index (H5O_efl_t *efl, const char *filename)
|
||||
</code>
|
||||
<dd>Gets the external file index number for a particular file name.
|
||||
If the name isn't in the external file list then it's added to
|
||||
the H5O_efl_t struct and immediately written to the object
|
||||
header to which the external file list message belongs. Name
|
||||
comparison is textual. Each name should be relative to the
|
||||
directory which contains the HDF5 file.
|
||||
|
||||
<p><dt><code>
|
||||
H5F_low_t *H5O_efl_open (H5O_efl_t *efl, intn index, uintn mode)
|
||||
</code>
|
||||
<dd>Gets a low-level file descriptor for an external file. The
|
||||
external file list caches file descriptors because we might
|
||||
have many more external files than there are file descriptors
|
||||
available to this process. The caller should not close this file.
|
||||
|
||||
<p><dt><code>
|
||||
herr_t H5O_efl_release (H5O_efl_t *efl)
|
||||
</code>
|
||||
<dd>Releases an external file list, closes all files
|
||||
associated with that list, and if the list has been modified
|
||||
since the call to <code>H5O_efl_new</code> flushes the message
|
||||
to disk.
|
||||
</dl>
|
||||
|
||||
<hr>
|
||||
<address><a href="mailto:robb@arborea.spizella.com">Robb Matzke</a></address>
|
||||
<!-- Created: Fri Oct 3 09:52:32 EST 1997 -->
|
||||
<!-- hhmts start -->
|
||||
Last modified: Tue Nov 25 12:36:50 EST 1997
|
||||
<!-- hhmts end -->
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user