[svn-r2482] Bringing "HDF5 Technical Notes" into development branch (from R1.2 branch)
This commit is contained in:
114
doc/html/TechNotes/IOPipe.html
Normal file
114
doc/html/TechNotes/IOPipe.html
Normal file
@@ -0,0 +1,114 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>The Raw Data I/O Pipeline</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<h1>The Raw Data I/O Pipeline</h1>
|
||||
|
||||
<p>The HDF5 raw data pipeline is a complicated beast that handles
|
||||
all aspects of raw data storage and transfer of that data
|
||||
between the file and the application. Data can be stored
|
||||
contiguously (internal or external), in variable size external
|
||||
segments, or regularly chunked; it can be sparse, extendible,
|
||||
and/or compressible. Data transfers must be able to convert from
|
||||
one data space to another, convert from one number type to
|
||||
another, and perform partial I/O operations. Furthermore,
|
||||
applications will expect their common usage of the pipeline to
|
||||
perform well.
|
||||
|
||||
<p>To accomplish these goals, the pipeline has been designed in a
|
||||
modular way so no single subroutine is overly complicated and so
|
||||
functionality can be inserted easily at the appropriate
|
||||
locations in the pipeline. A general pipeline was developed and
|
||||
then certain paths through the pipeline were optimized for
|
||||
performance.
|
||||
|
||||
<p>We describe only the file-to-memory side of the pipeline since
|
||||
the memory-to-file side is a mirror image. We also assume that a
|
||||
proper hyperslab of a simple data space is being read from the
|
||||
file into a proper hyperslab of a simple data space in memory,
|
||||
and that the data type is a compound type which may require
|
||||
various number conversions on its members.
|
||||
|
||||
<img alt="Figure 1" src="pipe1.gif">
|
||||
|
||||
<p>The diagrams should be read from the top down. The Line A
|
||||
in the figure above shows that <code>H5Dread()</code> copies
|
||||
data from a hyperslab of a file dataset to a hyperslab of an
|
||||
application buffer by calling <code>H5D_read()</code>. And
|
||||
<code>H5D_read()</code> calls, in a loop,
|
||||
<code>H5S_simp_fgath()</code>, <code>H5T_conv_struct()</code>,
|
||||
and <code>H5S_simp_mscat()</code>. A temporary buffer, TCONV, is
|
||||
loaded with data points from the file, then data type conversion
|
||||
is performed on the temporary buffer, and finally data points
|
||||
are scattered out to application memory. Thus, data type
|
||||
conversion is an in-place operation and data space conversion
|
||||
consists of two steps. An additional temporary buffer, BKG, is
|
||||
large enough to hold <em>N</em> instances of the destination
|
||||
data type where <em>N</em> is the same number of data points
|
||||
that can be held by the TCONV buffer (which is large enough to
|
||||
hold either source or destination data points).
|
||||
|
||||
<p>The application sets an upper limit for the size of the TCONV
|
||||
buffer and optionally supplies a buffer. If no buffer is
|
||||
supplied then one will be created by calling
|
||||
<code>malloc()</code> when the pipeline is executed (when
|
||||
necessary) and freed when the pipeline exits. The size of the
|
||||
BKG buffer depends on the size of the TCONV buffer and if the
|
||||
application supplies a BKG buffer it should be at least as large
|
||||
as the TCONV buffer. The default size for these buffers is one
|
||||
megabyte but the buffer might not be used to full capacity if
|
||||
the buffer size is not an integer multiple of the source or
|
||||
destination data point size (whichever is larger, but only
|
||||
destination for the BKG buffer).
|
||||
|
||||
|
||||
|
||||
<p>Occassionally the destination data points will be partially
|
||||
initialized and the <code>H5Dread()</code> operation should not
|
||||
clobber those values. For instance, the destination type might
|
||||
be a struct with members <code>a</code> and <code>b</code> where
|
||||
<code>a</code> is already initialized and we're reading
|
||||
<code>b</code> from the file. An extra line, G, is added to the
|
||||
pipeline to provide the type conversion functions with the
|
||||
existing data.
|
||||
|
||||
<img alt="Figure 2" src="pipe2.gif">
|
||||
|
||||
<p>It will most likely be quite common that no data type
|
||||
conversion is necessary. In such cases a temporary buffer for
|
||||
data type conversion is not needed and data space conversion
|
||||
can happen in a single step. In fact, when the source and
|
||||
destination data are both contiguous (they aren't in the
|
||||
picture) the loop degenerates to a single iteration.
|
||||
|
||||
|
||||
<img alt="Figure 3" src="pipe3.gif">
|
||||
|
||||
<p>So far we've looked only at internal contiguous storage, but by
|
||||
replacing Line B in Figures 1 and 2 and Line A in Figure 3 with
|
||||
Figure 4 the pipeline is able to handle regularly chunked
|
||||
objects. Line B of Figure 4 is executed once for each chunk
|
||||
which contains data to be read and the chunk address is found by
|
||||
looking at a multi-dimensional key in a chunk B-tree which has
|
||||
one entry per chunk.
|
||||
|
||||
<img alt="Figure 4" src="pipe4.gif">
|
||||
|
||||
<p>If a single chunk is requested and the destination buffer is
|
||||
the same size/shape as the chunk, then the CHUNK buffer is
|
||||
bypassed and the destination buffer is used instead as shown in
|
||||
Figure 5.
|
||||
|
||||
<img alt="Figure 5" src="pipe5.gif">
|
||||
|
||||
<hr>
|
||||
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
|
||||
<!-- Created: Tue Mar 17 11:13:35 EST 1998 -->
|
||||
<!-- hhmts start -->
|
||||
Last modified: Wed Mar 18 10:38:30 EST 1998
|
||||
<!-- hhmts end -->
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user