18 KiB
Welcome to VFD SWMR
Thank you for volunteering to test VFD SWMR.
SWMR, which stands for Single Writer/Multiple Reader, is a feature of the HDF5 library that lets a process write data to an HDF5 file while one or more processes read the file. Use cases range from monitoring data collection and/or steering experiments in progress to financial applications.
The following diagram illustrates how SWMR works.
VFD SWMR is designed to be a more flexible, more modular, better-performing replacement for the existing SWMR feature.
- VFD SWMR allows HDF5 objects (groups, datasets, attributes) to be created and destroyed in the course of a reader-writer session. Creating objects is not possible using the existing SWMR feature.
- It compartmentalizes much of the SWMR functionality in a virtual-file driver (VFD), thus easing The HDF Group's software-maintenance burden.
- And it makes guarantees for the maximum time from write to availability of data for read, provided that the reading and writing systems and their interconnections can keep up with the data flow.
For details on how VFD SWMR is implemented, see [TBD: LINK to RFC].
Quick start
Follow these instructions to download, configure, and build the VFD SWMR project in a jiffy. Then install the HDF5 library and utilites built by the VFD SWMR project.
Download
The latest source code here for VFD SWMR is found on the multi
branch of the VFD SWMR
repository.
Clone the repository in a new directory, then switch to the VFD SWMR branch:
% git clone https://bitbucket.hdfgroup.org/scm/~dyoung/vchoi_fork.git swmr
% cd swmr
% git checkout multi
Build
Setup for autotools:
% sh ./autogen.sh
Create a build directory, change to that directory, and run the configure script:
% mkdir -p ../build/swmr
% cd ../build/swmr
% ../../swmr/configure
Build the project:
% make
Test
We recommend that you run the full HDF5 test suite to make sure that VFD SWMR works correctly on your system. To test the library, utilities, run
% make check
If the tests don't pass, please let the developers know!
Sample programs
Extensible datasets
For an example of a program that uses VFD SWMR to write/read many
extensible datasets, have a look at test/vfd_swmr_bigset_writer.c, the
"bigset" test. We compile two binaries from that source file, one that
operates in write mode, and a second that operates in read mode.
In write mode, "bigset" creates an HDF5 file containing one or more datasets that are extensible in either one dimension or two. Then it runs for several steps, increasing the size of each dataset in each dimension once every step. The dimensions, number of datasets, the step increase in dataset size, and the number of steps are configurable using command-line options -d, -s, -r and -c, and -n, respectively---use the -h option to get a usage message. Each dataset is written with a predictable pattern.
In read mode, "bigset" reads each dataset from an HDF5 file created by a "bigset" writer and verifies the patterns. It takes the same command-line parameters as the "bigset" writer. The reader and writer may run concurrently; the reader "polls" the content until it is just shy of complete, given the number of steps expected.
To run a bigset test, I open a couple of terminal windows, one for the
reader and one for the writer. I change to the test directory under
my build directory, and I run the writer in one window:
% ./vfd_swmr_bigset_writer -n 50 -d 2
and in the other window, I run the reader:
% ./vfd_swmr_bigset_reader -n 50 -d 2 -W
The writer will wait for a signal before it quits. You may tap Control-C to make it quit.
The reader and writer programs support several command-line options:
-
-h: show program usage -
-W: stop the program from waiting for a signal before it quits. -
-q: suppress the progress messages that the programs write to the standard error stream. -
-V: create a virtual dataset with content in three source datasets in the same HDF5 file---only available when the writer creates a dataset extensible in one dimension (-d 1) -
-M: like-V, the writer creates the virtual dataset on three source datasets, but each source dataset is in a different HDF5 file.
The VFD SWMR demos
The VFD SWMR demos are in a separate repository.
Before you build the demos, you will need to install the HDF5 library
and utilities built from the VFD SWMR branch in your home directory
somewhere. In the ./configure step, use the command-line option
--prefix=$HOME/path/for/library to set the directory you prefer.
In the demo Makefiles, update the H5CC variable with the path to
the h5cc installed from the VFD SWMR branch. Then you should be
able to make and make clean the demos.
Under gaussians/, two programs are built, wgaussians and
rgaussians. If you start both from the same directory in different
terminals, you should see the "bouncing 2-D Gaussian distributions"
in the rgaussians terminal.
The creation-deletion (credel) demo is also run in two terminals.
The two command lines are given in credel/README.md. You need
to use the h5ls installed from the VFD SWMR branch, since only
that version has the --poll option.
Developer tips
Configuring VFD SWMR
File-creation properties
To use VFD SWMR, creating your HDF5 file with paged allocation strategy is mandatory. This call enables the paged allocation strategy:
ret = H5Pset_file_space_strategy(fcpl, H5F_FSPACE_STRATEGY_PAGE, false, 1);
Allocated storage that is smaller than the page size will not overlap a page boundary, and allocated storage that is one page or greater in size will start on a page boundary. VFD SWMR relies on that allocation strategy.
File-access properties
In this section we dissect vfd_swmr_create_fapl(), a helper routine in
the VFD SWMR tests, to show how to configure your application to use VFD
SWMR.
hid_t
vfd_swmr_create_fapl(bool writer, bool only_meta_pages, bool use_vfd_swmr)
{
H5F_vfd_swmr_config_t config;
hid_t fapl;
h5_fileaccess() is also a helper routine for the tests. In your
program, you can replace the h5_fileaccess() call with a call to
H5Pcreate(H5P_FILE_ACCESS).
/* Create file access property list */
if((fapl = h5_fileaccess()) < 0) {
warnx("h5_fileaccess");
return badhid;
}
VFD SWMR has only been tested with the latest file format. It may malfunction with older formats, we just don't know. We force the latest version here.
/* FOR NOW: set to use latest format, the "old" parameter is not used */
if(H5Pset_libver_bounds(fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST) < 0) {
warnx("H5Pset_libver_bounds");
return badhid;
}
/*
* Set up to open the file with VFD SWMR configured.
*/
VFD SWMR relies on metadata reads and writes to go through the page buffer. Note that the default page size is 4096 bytes. This call sets the total page buffer size to 4096 bytes. So we have effectively created a one-page page buffer! That is adequate for testing, but it may not be best for your application.
If only_meta_pages is true, then the entire page buffer is
dedicated to metadata. That's fine for VFD SWMR.
Note well: when VFD SWMR is enabled, the meta-/raw-data pages proportion
set by H5Pset_page_buffer_size() does not actually control the
pages reserved for raw data. All pages are dedicated to buffering
metadata.
/* Enable page buffering */
if(H5Pset_page_buffer_size(fapl, 4096, only_meta_pages ? 100 : 0, 0) < 0) {
warnx("H5Pset_page_buffer_size");
return badhid;
}
Add VFD SWMR-specific configuration to the file-access property list
(fapl) using an H5Pset_vfd_swmr_config() call.
When VFD SWMR is enabled, changes to the HDF5 metadata accumulate in RAM until a configurable unit of time known as a tick has passed. At the end of each tick, a snapshot of the metadata at the end of the tick is "published"---that is, made visible to the readers.
The length of a tick is configurable in units of 100 milliseconds
using the tick_len parameter. Below, tick_len is set to 4 to
select a tick length of 400ms.
A snapshot does not persist forever, but it expires after a number
of ticks, given by the maximum lag, has passed. Below, max_lag
is set to 7 to select a maximum lag of 7 ticks. After a snapshot
has expired, the writer may overwrite it.
When a reader first enters the API, it starts to use, or "selects," the metadata in the newest snapshot, and on every subsequent API entry, if a tick has passed since the last selection, and if new snapshots are available, then the reader selects the latest.
If a reader spends longer than max_lag - 1 ticks (2400ms with
the example configuration) inside the HDF5 API, then its snapshot
may expire, resulting in undefined behavior. When a snapshot
expires while the reader is using it, we say that the writer has
"overrun" the reader. The writer cannot currently detect overruns.
Frequently the reader will detect an overrun and force the program
to exit with a diagnostic assertion failure.
The application tells VFD SWMR whether or not to configure for
reading or writing a file by setting the writer parameter to
true for writing or false for reading.
VFD SWMR snapshots are stored in a "shadow file" that is shared
between writer and readers. On a POSIX system, the shadow file
may be placed on any local filesystem that the reader and writer
share. The md_file_path parameter tells where to put the shadow
file.
The md_pages_reserved parameter tells how many pages to reserve
at the beginning of the shadow file for the shadow-file header
and the shadow index. The header has an entire page to itself.
The remaining md_pages_reserved - 1 pages are reserved for the
shadow index. If the index grows larger than its initial
allocation, then it will move to a new location in the shadow file,
and the initial allocation will be reclaimed. md_pages_reserved
must be at least 2.
The version parameter tells what version of VFD SWMR configuration
the parameter struct config contains. For now, it should be
initialized to H5F__CURR_VFD_SWMR_CONFIG_VERSION.
memset(&config, 0, sizeof(config));
config.version = H5F__CURR_VFD_SWMR_CONFIG_VERSION;
config.tick_len = 4;
config.max_lag = 7;
config.writer = writer;
config.md_pages_reserved = 128;
HDstrcpy(config.md_file_path, "./my_md_file");
/* Enable VFD SWMR configuration */
if(use_vfd_swmr && H5Pset_vfd_swmr_config(fapl, &config) < 0) {
warnx("H5Pset_vfd_swmr_config");
return badhid;
}
return fapl;
}
Using virtual datasets (VDS)
An application may want to use VFD SWMR to create, read, or write a virtual dataset. Unfortunately, VDS does not work properly with VFD SWMR at this time. In this section, we describe some workarounds that can be used with great care to make VDS and VFD SWMR cooperate.
A virtual dataset, when it is read or written, will open files on
an application's behalf in order to access the source datasets
inside. If a virtual dataset resides on file v.h5, and one of
its source datasets resides on a second file, s1.h5, then the
virtual dataset will try to open s1.h5 using the same file-access
properties as v.h5. Thus, if v.h5 is open with VFD SWMR with
shadow file v.shadow, then the virtual dataset will try to open
s1.h5 with the same shadow file, which will fail.
Suppose that v.h5 is not open with VFD SWMR, but it was opened
with default file-access properties. Then the virtual dataset will
open the source dataset on s1.h5 with default file-access
properties, too. This default virtual-dataset behavior is not
helpful to the application that wants to use VFD SWMR to read or
write source datasets.
To use VFD SWMR with VDS, an application should pre-open each file using its preferred file-access properties, including independent shadow filenames for each source file. As long as the virtual dataset remains in use, the application should leave each of the pre-opened files open. In this way the library, when it tries to open the source files, will always find them already open and re-use the already-open files with the file-access properties established on first open.
Pushing HDF5 content to reader visibility
With VFD SWMR, ordinarily it should not be necessary to call
H5Fflush(). In fact, when VFD SWMR is active, calling H5Fflush()
may slow down your program considerably because the call will not
return until after max_lag ticks have passed.
A writer can make its last changes to an HDF5 file visible to all
readers immediately using the new call, H5Fvfd_swmr_end_tick().
A writer should use H5Fvfd_swmr_end_tick() carefully: by calling
it more frequently than once a tick, a writer may corrupt a reader's
view of the HDF5 file.
When VFD SWMR is enabled, raw data is not cached in the page buffer. On each tick, the content of chunk caches and other unwritten raw data is flushed directly to the HDF5 file, so that raw data is always available before the HDF5 structural metadata that describes it.
Reading up-to-date content
The HDF Group (THG) expects that in one class of VFD SWMR application, instruments on a particle accelerator will continuously generate 2-dimensional data frames and add them to HDF5 datasets while an experiment is ongoing. The datasets will be written to an HDF5 file opened in VFD SWMR mode. Experimenters will monitor a real-time display of the datasets while the experiment takes place. A second program, possibly running on a second computer, will generate the display. The second program will open the HDF5 file in VFD SWMR mode, too.
THG developed a demonstration program for class of application, and we have some advice based on that experience.
The writer typically will increase a dataset's dimensions by a
frame, using H5Dset_extent(), before it writes the data of that
frame with H5Dwrite(). It's possible that a snapshot of the HDF5
file will propagate to the reader between the H5Dset_extent()
call and the H5Dwrite(). Values H5Dread() from the last frame
at that juncture will not reflect the actual experimental data.
Instead, the reader will see arbitrary values or the fill value.
To display those values would be distracting and misleading to
the experimenter.
On the reader, a strategy for displaying the most current, bonafide application
data is to read the dimensions of the frames dataset, d, compute
the number n of full frames contained in d, and read the
next-to-last frame, n - 2. THG uses a variant of this strategy
in its gaussians demo.
On the writer, a strategy for protecting against snapshots between
the H5Dset_extent() and H5Dwrite() calls is to suspend VFD
SWMR's clock across both of the calls. The
H5Fvfd_swmr_disable_end_of_tick() call takes a file identifier
and stops new snapshots from being taken on the given file until
H5Fvfd_swmr_enable_end_of_tick() is called on the same file.
Known issues
Variable-length data
A VFD SWMR reader cannot reliably read back a variable-length dataset written by VFD SWMR. For example, a variable-length string created and written as follows
hid_t dset, space, type;
char data[] = "content";
type = H5Tcopy(H5T_C_S1);
H5Tset_size(type, H5T_VARIABLE);
space = H5Screate(H5S_SCALAR);
dset = H5Dcreate2(..., "string", type, space, H5P_DEFAULT, H5P_DEFAULT,
H5P_DEFAULT);
H5Dwrite(dset, type, space, space, H5P_DEFAULT, &data);
and read back like this,
char *data;
herr_t ret;
ret = H5Dread(..., ..., H5S_ALL, H5S_ALL, H5P_DEFAULT, &data);
may produce either an error return from H5Dread (ret < 0) or
a NULL pointer (data == NULL).
Planned improvements to the HDF5 global heap may alleviate this problem. There is no schedule for those improvements.
Improvements to VFD SWMR may also alleviate the problem.
Microsoft Windows
VFD SWMR is not officially supported on Microsoft Windows at this time. The feature should in theory work on Windows and NTFS, however it has not been tested as the existing VFD SWMR tests rely on shell scripts. Note that Windows file shares are not supported as there is no write ordering guarantee (as with NFS, et al.).
Supported filesystems
A VFD SWMR writer and readers share a couple of files, the HDF5 (.h5)
file and the shadow file. VFD SWMR relies on writes to the files to
take effect in the order described in the POSIX documentation for
read(2) and write(2) system calls. If the VFD SWMR readers and the
writer run on the same POSIX host, this ordering should take effect,
regardless of the underlying filesystem.
If the VFD SWMR reader and the writer run on different hosts, then the write-ordering rules depend on the shared filesystem. VFD SWMR is not generally expected to work with NFS at this time. GPFS is reputed to order writes according to POSIX convention, so we expect VFD SWMR to work with GPFS. (Caveat: we are still looking for an authoritative description of GPFS I/O semantics.)
The HDF Group plans to add support for NFS to VFD SWMR in the future.
File-opening order
If an application tries to open a file in VFD SWMR reader mode, and the
file is not already open by a VFD SWMR writer, then the application will
sleep in the H5Fopen() call until either the writer opens the same
file (using the same shadow file) or the reader times out after several
seconds.
Reporting bugs
VFD SWMR is still under construction, so I think that you will find some bugs. Please do not hesitate to report them.
To contact the VFD SWMR developers, email vfdswmr@hdfgroup.org.