Start » Filter Reference » System » Binary Data » ReadStringFromBuffer

ReadStringFromBuffer


Module: FoundationLite

Reads string value in specified binary format from a byte buffer.

Name Type Range Description
Input value
inBuffer ByteBuffer Source data
Input value
inOffset Integer 0 - Read start position
Input value
inFormat StringBinaryFormat Binary serialization format responsible for handling string length
Input value
inTextEncoding StringEncodingFormat Binary character encoding format
Input value
inMaxLength Integer 0 - 536870911 Maximum length of resulting string in code units
Output value
outValue String Read value
Output value
outOffset Integer Resulting position behind read data

Description

This filter is intended for reading (deserializing) a text value out of a raw binary data in a ByteBuffer.

Data is read starting at the position provided by the inOffset input (in bytes). This position is than advanced by the size of read data and returned on the outOffset output. This output can be connected to the inOffset input of other byte buffer reading filter when reading consecutive fields of data structures. When data read spans beyond the end of the input buffer an IOError is raised.

There are two main aspects of formatting the binary representation of a text data. First is a character encoding, that determines how characters from different alphabets and languages are substituted into numeric codes and how those numeric codes are stored. Second is marking a resulting string length. Length of a string is always considered in context of character-encoded text representation (in resulting binary data). Length can be also measured differently depending on the selected character encoding.

inTextEncoding input is selecting a required character encoding:

  • ASCII - basic ASCII encoding with single byte per character, only the range of characters in lower half (0...127) are allowed. Length in this encoding is measured in bytes.
  • UTF8 - Unicode UTF-8 encoding (without BOM), variable byte length of character. Length in this encoding is measured in bytes.
  • UTF16LE - Unicode UTF-16 Little-Endian encoding, two bytes (16bit) per character with possible surrogate pairs. Length in this encoding is measured in 16-bit code points (two bytes per unit).
  • UTF16BE - Unicode UTF-16 Big-Endian encoding, two bytes (16bit) per character with possible surrogate pairs. Length in this encoding is measured in 16-bit code points (two bytes per unit).
  • ANSI - default encoding (ANSI variant in Microsoft Windows) of local system is used. Actual encoding depends on local system regional settings. Length of a single character may be a single byte, but for some regions variable length of a character is possible (a multi-byte strings). Length in this encoding is measured in bytes.
  • OEM - default encoding of local system, similarly to ANSI, but OEM variant in Microsoft Windows is used.

It is strongly advised to not use ANSI and OEM encodings whenever possible and use one of the Unicode encodings instead. Those options are provided only for compatibility with older systems and data formats.

inFormat input is selecting a general string representation:

  • VariableLength_Raw - variable length of string field without any additional prefixes and suffixes (just raw text in its character encoding is stored in the buffer).
  • VariableLength_NullTerminated - variable length of string field with null termination character at its end (size of a null terminator is appropriate to the specified character encoding).
  • VariableLength_8BitPrefix - variable length of string field with 8-bit prefix specifying the length of the encoded string. Resulting string length must not be longer than 255.
  • VariableLength_16BitLEPrefix - variable length of string field with 16-bit little endian prefix specifying the length of the encoded string. Resulting string length must not be longer than 65535.
  • VariableLength_32BitLEPrefix - variable length of string field with 32-bit little endian prefix specifying the length of the encoded string.
  • VariableLength_16BitBEPrefix - variable length of string field with 16-bit big endian prefix specifying the length of the encoded string. Resulting string length must not be longer than 65535.
  • VariableLength_32BitBEPrefix - variable length of string field with 32-bit big endian prefix specifying the length of the encoded string.
  • VariableLength_LEB128Prefix - variable length of string field with variable length prefix stored using LEB128 coding (sometimes referred also as UTF-7 coding).
  • FixedLength_Raw - resulting data field is always of fixed length, specified by the inMaxLength input (in units determined by the character encoding), when the actual text is shorter than the specified length it is padded with null codes.
  • FixedLength_NullTerminated - resulting data field is always of fixed length, specified by the inMaxLength input (in units determined by the character encoding), when the actual text is shorter than the specified length it is padded with null codes, at least one null code must be always padded at the end of the text.

inMaxLength input specifies the maximum allowed length of the source string (before converting from the requested character encoding and including eventual null terminator). When the actual text length is greater than that (e.g. when null terminator is not found in proper range) an IOError will be raised. This input also specified the length of the data field when using fixed-length formats. When working with fixed-length formats the padding at the end of text is automatically removed, so resulting text can be shorter that the value specified in inMaxLength input.

When using the VariableLength_Raw format this filter will always read exactly the amount of source text code points as specified on the inMaxLength input. Because in this mode the filter cannot determine the length of the source string this value must be determined explicitly and passed on the inMaxLength input.

When the source string contains one or more characters that are invalid in the specified character encoding an IOError is raised.

Errors

This filter can throw an exception to report error. Read how to deal with errors in Error Handling.

List of possible exceptions:

Error type Description
DomainError ANSI and OEM encodings are not supported on Linux.
DomainError Not supported string binary format.
DomainError Not supported text encoding format.
IoError Empty byte buffer at input of ReadStringFromBuffer.
IoError Invalid format of string data. No null terminator within required range.
IoError Invalid format of string. Corrupted surrogate pair in UTF-16 string data.
IoError Invalid format of string. Invalid LEB128 encoded value.
IoError Invalid format of string. Invalid multi-byte character in string data.
ANSI string encoding is selected and the read string contains character or characters that are invalid in coding of current system regional settings.
IoError Invalid format of string. Invalid UTF-8 sequence in string data.
IoError Invalid format of string. Non ASCII character in string data.
IoError Invalid format of string. Null character in middle of string data.
IoError Invalid format of string. Prefixed string length is too large.
IoError Invalid format of string. Reading beyond the end of the byte buffer.
Dynamic length of the string stored at specified offset position spans beyond the end of the byte buffer.
IoError Invalid format of string. Resulting string length is too large.
IoError Reading beyond the end of the byte buffer.
Source data range specified by the inOffset input and the size of string binary format spans beyond the end of the byte buffer.
IoError Source string length exceeds inMaxLength value.
Dynamic length of the string stored at specified offset position is larger that the value on inMaxLength input.
SystemError Requested text encoding is not available in local system.

Complexity Level

This filter is available on Basic Complexity Level.

Filter Group

This filter is member of ReadFromBuffer filter group.

See Also

  • WriteStringToBuffer – Converts string value into specified binary representation and writes it to a byte buffer.