[svn-r2482] Bringing "HDF5 Technical Notes" into development branch (from R1.2 branch)

2000-08-25 12:42:24 -05:00
parent 3ff571ab58
commit 2e8cd59163
35 changed files with 4108 additions and 0 deletions
--- a/doc/html/TechNotes/IOPipe.html
+++ b/doc/html/TechNotes/IOPipe.html
@@ -0,0 +1,114 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+  <head>
+    <title>The Raw Data I/O Pipeline</title>
+  </head>
+
+  <body>
+    <h1>The Raw Data I/O Pipeline</h1>
+
+    <p>The HDF5 raw data pipeline is a complicated beast that handles
+      all aspects of raw data storage and transfer of that data
+      between the file and the application.  Data can be stored
+      contiguously (internal or external), in variable size external
+      segments, or regularly chunked; it can be sparse, extendible,
+      and/or compressible. Data transfers must be able to convert from
+      one data space to another, convert from one number type to
+      another, and perform partial I/O operations. Furthermore,
+      applications will expect their common usage of the pipeline to
+      perform well.
+
+    <p>To accomplish these goals, the pipeline has been designed in a
+      modular way so no single subroutine is overly complicated and so
+      functionality can be inserted easily at the appropriate
+      locations in the pipeline.  A general pipeline was developed and
+      then certain paths through the pipeline were optimized for
+      performance.
+
+    <p>We describe only the file-to-memory side of the pipeline since
+      the memory-to-file side is a mirror image. We also assume that a
+      proper hyperslab of a simple data space is being read from the
+      file into a proper hyperslab of a simple data space in memory,
+      and that the data type is a compound type which may require
+      various number conversions on its members.
+
+      <img alt="Figure 1" src="pipe1.gif">
+
+    <p>The diagrams should be read from the top down. The Line A
+      in the figure above shows that <code>H5Dread()</code> copies
+      data from a hyperslab of a file dataset to a hyperslab of an
+      application buffer by calling <code>H5D_read()</code>. And
+      <code>H5D_read()</code> calls, in a loop,
+      <code>H5S_simp_fgath()</code>, <code>H5T_conv_struct()</code>,
+      and <code>H5S_simp_mscat()</code>. A temporary buffer, TCONV, is
+      loaded with data points from the file, then data type conversion
+      is performed on the temporary buffer, and finally data points
+      are scattered out to application memory. Thus, data type
+      conversion is an in-place operation and data space conversion
+      consists of two steps. An additional temporary buffer, BKG, is
+      large enough to hold <em>N</em> instances of the destination
+      data type where <em>N</em> is the same number of data points
+      that can be held by the TCONV buffer (which is large enough to
+      hold either source or destination data points).
+
+    <p>The application sets an upper limit for the size of the TCONV
+      buffer and optionally supplies a buffer. If no buffer is
+      supplied then one will be created by calling
+      <code>malloc()</code> when the pipeline is executed (when
+      necessary) and freed when the pipeline exits.  The size of the
+      BKG buffer depends on the size of the TCONV buffer and if the
+      application supplies a BKG buffer it should be at least as large
+      as the TCONV buffer.  The default size for these buffers is one
+      megabyte but the buffer might not be used to full capacity if
+      the buffer size is not an integer multiple of the source or
+      destination data point size (whichever is larger, but only
+      destination for the BKG buffer).
+
+
+
+    <p>Occassionally the destination data points will be partially
+      initialized and the <code>H5Dread()</code> operation should not
+      clobber those values.  For instance, the destination type might
+      be a struct with members <code>a</code> and <code>b</code> where
+      <code>a</code> is already initialized and we're reading
+      <code>b</code> from the file.  An extra line, G, is added to the
+      pipeline to provide the type conversion functions with the
+      existing data.
+
+      <img alt="Figure 2" src="pipe2.gif">
+
+    <p>It will most likely be quite common that no data type
+      conversion is necessary.  In such cases a temporary buffer for
+      data type conversion is not needed and data space conversion
+      can happen in a single step. In fact, when the source and
+      destination data are both contiguous (they aren't in the
+      picture) the loop degenerates to a single iteration.
+
+
+      <img alt="Figure 3" src="pipe3.gif">
+
+    <p>So far we've looked only at internal contiguous storage, but by
+      replacing Line B in Figures 1 and 2 and Line A in Figure 3 with
+      Figure 4 the pipeline is able to handle regularly chunked
+      objects. Line B of Figure 4 is executed once for each chunk
+      which contains data to be read and the chunk address is found by
+      looking at a multi-dimensional key in a chunk B-tree which has
+      one entry per chunk.
+
+      <img alt="Figure 4" src="pipe4.gif">
+
+    <p>If a single chunk is requested and the destination buffer is
+      the same size/shape as the chunk, then the CHUNK buffer is
+      bypassed and the destination buffer is used instead as shown in
+      Figure 5.
+
+      <img alt="Figure 5" src="pipe5.gif">
+
+    <hr>
+    <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
+<!-- Created: Tue Mar 17 11:13:35 EST 1998 -->
+<!-- hhmts start -->
+Last modified: Wed Mar 18 10:38:30 EST 1998
+<!-- hhmts end -->
+  </body>
+</html>