[svn-r1539]
Datatypes.html
VLdatatypes.html (temporary doc file)
RM_H5T.html
RM_H5D.html
Adding variable-length datatype information to the user docs.
VLdatatypes.html is a temporary file; it will be removed once
all of the information is incorporated into Datatypes.html
and the RM.
This commit is contained in:
201
doc/html/VLdatatypes.html
Normal file
201
doc/html/VLdatatypes.html
Normal file
@@ -0,0 +1,201 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>Variable-length Datatypes</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF">
|
||||
<pre>
|
||||
|
||||
|
||||
VARIABLE-LENGTH DATATYPES IN HDF5
|
||||
(a temporary document)
|
||||
|
||||
|
||||
Variable-length Datatype Overview And Justification:
|
||||
----------------------------------------------------
|
||||
Variable-length (VL) datatypes are sequences of an existing datatype
|
||||
(atomic, VL or compound) which are not fixed in length from one dataset location
|
||||
to another. They are similar to C character strings in essence - a sequence of
|
||||
a type which is pointed to by a particular type of "pointer", although they are
|
||||
implemented more closely to FORTRAN strings by including an explicit length in
|
||||
the "pointer" instead of using a particular value to terminate the sequence.
|
||||
|
||||
VL datatypes are useful to the scientific community in many different ways,
|
||||
some of which are listed below:
|
||||
- Ragged Arrays: Multi-dimensional ragged arrays can be implemented with
|
||||
the last (fastest changing) dimension being ragged by using a
|
||||
VL datatype as the type of the element stored. (Or as a field in a
|
||||
compound datatype)
|
||||
- Fractal Arrays: If a compound datatype has a VL field of another compound
|
||||
type with VL fields (a "nested" VL datatype), this can be used to
|
||||
implement ragged arrays of ragged arrays, to whatever nesting depth is
|
||||
required for the user.
|
||||
- Polygon Lists: A common storage requirement is to efficiently store arrays
|
||||
of polygons with different numbers of vertices. VL datatypes can be
|
||||
used to efficiently and succinctly describe an array of polygons with
|
||||
different numbers of vertices.
|
||||
- Character Strings: Perhaps the most common use of VL datatypes will be to
|
||||
store C-like VL character strings in dataset elements or as attributes
|
||||
of objects.
|
||||
- Indices: An array of VL object references could be used as an index to
|
||||
all the objects in a file which contained a particular sequence of
|
||||
dataset values. Perhaps an array something like the following:
|
||||
Value1: Object1, Object3, Object9
|
||||
Value2: Object0, Object12, Object14, Object21, Object22
|
||||
Value3: Object2
|
||||
Value4: <none>
|
||||
Value5: Object1, Object10, Object12
|
||||
.
|
||||
.
|
||||
- Object Tracking: An array of VL dataset region references can be used as
|
||||
a method of tracking objects or features appearing in a sequence of
|
||||
datasets. Perhaps an array of them would look like:
|
||||
Feature1: Dataset1:Region, Dataset3:Region, Dataset9:Region
|
||||
Feature2: Dataset0:Region, Dataset12:Region, Dataset14:Region,
|
||||
Dataset21:Region, Dataset22:Region
|
||||
Feature3: Dataset2:Region
|
||||
Feature4: <none>
|
||||
Feature5: Dataset1:Region, Dataset10:Region, Dataset12:Region
|
||||
.
|
||||
.
|
||||
|
||||
|
||||
Variable-length Datatype Memory Management:
|
||||
-------------------------------------------
|
||||
With each element possibly being of different sequence lengths for a
|
||||
dataset with a VL datatype, the memory for the VL datatype must be dynamically
|
||||
allocated. Currently there are two methods of managing the memory for VL
|
||||
datatypes: the standard C malloc/free memory allocation routines or a method
|
||||
of calling user-defined memory management routines to allocate or free memory.
|
||||
Since the memory allocated when reading (or writing) may be complicated to
|
||||
release, an HDF5 routine is provided to traverse a memory buffer and free the
|
||||
VL datatype information without leaking memory.
|
||||
|
||||
|
||||
Why Variable-length Datatypes Can't Be Divided:
|
||||
-----------------------------------------------
|
||||
VL datatypes are designed so that they can not be subdivided by the library
|
||||
with selections, etc. This design was chosen due to the complexities in
|
||||
specifying selections on each VL element of a dataset through a selection API
|
||||
that is easy to understand. Also, the selection APIs work on dataspaces, not
|
||||
on datatypes. At some point in time, we may want to create a way for
|
||||
dataspaces to have VL components to them and we would need to allow selections
|
||||
of those VL regions, but that is beyond the scope of this document.
|
||||
|
||||
|
||||
What Happens If The Library Runs Out Of Memory While Reading?:
|
||||
--------------------------------------------------------------
|
||||
It is possible for a call to H5Dread to fail while reading in VL datatype
|
||||
information if the memory required exceeds that which is available. In this
|
||||
case, the H5Dread call will fail gracefully and any VL data which has been
|
||||
allocated prior to the memory shortage will be returned to the system via the
|
||||
memory management routines detailed below. It may be possible to design a
|
||||
"partial read" API function at a later date, if demand for such a function
|
||||
warrants.
|
||||
|
||||
|
||||
Strings as Variable-length Datatypes:
|
||||
-------------------------------------
|
||||
Since character strings are a special case of VL data that is implemented
|
||||
in many different ways on different machines and programming languages, they are
|
||||
handled somewhat differently from other VL datatypes in HDF5.
|
||||
HDF5 has native VL strings for each language API, which are stored the
|
||||
same way on disk, but are exported through each language API in a natural way
|
||||
for that language. When retrieving VL strings from a dataset, users may choose
|
||||
to have them stored in memory as a native VL string or in HDF5's hvl_t struct
|
||||
for VL datatypes.
|
||||
VL strings may be created in one of two ways: by creating a VL datatype with
|
||||
a base type of H5T_NATIVE_ASCII, H5T_NATIVE_UNICODE, etc or by creating a string
|
||||
datatype and setting it's length to H5T_STRING_VARIABLE. The second method is
|
||||
used to access native VL strings in memory. The library will convert between
|
||||
the two types, but they are stored on disk using different datatypes and have
|
||||
different memory representations..
|
||||
Multi-byte character representations, such as UNICODE or "wide" characters
|
||||
in C/C++, will need the appropriate character and string datatypes created
|
||||
so that they can be described properly through the datatype API. Additional
|
||||
conversions between these types and the current ASCII characters will also be
|
||||
required.
|
||||
Variable-width character strings (which might be compressed data or some
|
||||
other encoding) are not currently handled by this design. We will evaluate
|
||||
how to implement them based on user's feedback.
|
||||
|
||||
|
||||
Variable-length Datatype API:
|
||||
-----------------------------
|
||||
Creation:
|
||||
VL datatypes are created with the H5Tvlen_create() function as follows:
|
||||
type_id=H5Tvlen_create(hid_t base_type_id);
|
||||
The base datatype will be the datatype that the sequence is composed of,
|
||||
characters for character strings, vertex coordinates for polygon lists, etc.
|
||||
The base type specified for the VL datatype can be of any HDF5 datatype,
|
||||
including another VL datatype, a compound datatype or an atomic datatype.
|
||||
|
||||
|
||||
Query base type of VL datatype:
|
||||
It may be necessary to know the base type of a VL datatype before memory
|
||||
is allocated, etc. The base type is queried with the H5Tget_super()
|
||||
function, described in the H5T documentation.
|
||||
|
||||
|
||||
Query minimum memory required for VL information:
|
||||
It order to predict the memory usage that H5Dread may need to allocate to
|
||||
store VL data while reading the data, the H5Dget_vlen_size() function is
|
||||
provided, as follows:
|
||||
herr_t H5Dget_vlen_buf_size(hid_t dataset_id, hid_t type_id,
|
||||
hid_t space_id, hsize_t *size)
|
||||
(This function is not implemented in Release 1.2.)
|
||||
This routine checks the number of bytes required to store the VL data from
|
||||
the dataset, using the space_id for the selection in the dataset on disk and
|
||||
the type_id for the memory representation of the VL data in memory. The *size
|
||||
value is modified according to how many bytes are required to store the VL data
|
||||
in memory.
|
||||
|
||||
|
||||
Specifying how to manage memory for the VL datatype:
|
||||
The memory management method is determined by dataset transfer properties
|
||||
passed into the H5Dread/H5Dwrite with the dataset transfer property list. There
|
||||
are currently two different memory managers defined:
|
||||
H5T_VLEN_STDC_MEM_MANAGE - The standard C malloc/free pair is used to
|
||||
allocate and free VL sequences. This is the default method.
|
||||
H5T_VLEN_USER_MEM_MANAGE - Calls to user provided memory management
|
||||
routines are generated during I/O and memory reclamation.
|
||||
These different methods can be chosen with the H5Pset_vlen_mem_manage_type()
|
||||
call, as follows:
|
||||
herr_t H5Pset_vlen_mem_manage_type(hid_t plist_id, H5T_vlen_mem_t type)
|
||||
When user defined memory management is chosen, allocation and free routines
|
||||
must also be provided with the H5Pset_vlen_mem_manage_routines() API call, as
|
||||
follows:
|
||||
herr_t H5Pset_vlen_mem_manager(hid_t plist_id,
|
||||
H5MM_allocate_t alloc, void *alloc_info,
|
||||
H5MM_free_t free, void *free_info)
|
||||
The prototypes for these functions look like:
|
||||
typedef void *(*H5MM_allocate_t)(size_t size,void *info) ;
|
||||
typedef void (*H5MM_free_t)(void *mem, void *free_info) ;
|
||||
The alloc_info and free_info parameters can be used to pass along any
|
||||
required information to the user's memory management routines.
|
||||
|
||||
|
||||
Recovering memory from VL buffers read in:
|
||||
The complex memory buffers created for a VL datatype may be reclaimed with
|
||||
the H5Dvlen_reclaim() function call, as follows:
|
||||
herr_t H5Dvlen_reclaim(hid_t type_id, hid_t space_id, hid_t plist_id,
|
||||
void *buf);
|
||||
The type_id must be the datatype stored in the buffer, space_id describes
|
||||
the selection for the memory buffer to free the VL datatypes within, plist_id
|
||||
is the dataset transfer property list which was used for the I/O transfer to
|
||||
create the buffer and buf is the pointer to the buffer to free the VL memory
|
||||
within. The VL structures (hvl_t's) in the users buffer are modified to zero
|
||||
out the VL information after it's been freed.
|
||||
If "nested" VL datatypes were used to create the buffer, this routine frees
|
||||
them from the "bottom" up, releasing all the memory without creating memory
|
||||
leaks.
|
||||
|
||||
|
||||
Code Examples:
|
||||
--------------
|
||||
For samples VL datatype code, see the tests in test/tvltypes.c in the
|
||||
HDF5 distribution.
|
||||
|
||||
</pre>
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user