Start » Filter Reference » System » Binary Data » ReadStringFromBuffer
Module: | FoundationLite |
---|
Reads string value in specified binary format from a byte buffer.
Name | Type | Range | Description | |
---|---|---|---|---|
inBuffer | ByteBuffer | Source data | ||
inOffset | Integer | 0 - | Read start position | |
inFormat | StringBinaryFormat | Binary serialization format responsible for handling string length | ||
inTextEncoding | StringEncodingFormat | Binary character encoding format | ||
inMaxLength | Integer | 0 - 536870911 | Maximum length of resulting string in code units | |
outValue | String | Read value | ||
outOffset | Integer | Resulting position behind read data |
Description
This filter is intended for reading (deserializing) a text value out of a raw binary data in a ByteBuffer.
Data is read starting at the position provided by the inOffset input (in bytes). This position is than advanced by the size of read data and returned on the outOffset output. This output can be connected to the inOffset input of other byte buffer reading filter when reading consecutive fields of data structures. When data read spans beyond the end of the input buffer an IOError is raised.
There are two main aspects of formatting the binary representation of a text data. First is a character encoding, that determines how characters from different alphabets and languages are substituted into numeric codes and how those numeric codes are stored. Second is marking a resulting string length. Length of a string is always considered in context of character-encoded text representation (in resulting binary data). Length can be also measured differently depending on the selected character encoding.
inTextEncoding input is selecting a required character encoding:
- ASCII - basic ASCII encoding with single byte per character, only the range of characters in lower half (0...127) are allowed. Length in this encoding is measured in bytes.
- UTF8 - Unicode UTF-8 encoding (without BOM), variable byte length of character. Length in this encoding is measured in bytes.
- UTF16LE - Unicode UTF-16 Little-Endian encoding, two bytes (16bit) per character with possible surrogate pairs. Length in this encoding is measured in 16-bit code points (two bytes per unit).
- UTF16BE - Unicode UTF-16 Big-Endian encoding, two bytes (16bit) per character with possible surrogate pairs. Length in this encoding is measured in 16-bit code points (two bytes per unit).
- ANSI - default encoding (ANSI variant in Microsoft Windows) of local system is used. Actual encoding depends on local system regional settings. Length of a single character may be a single byte, but for some regions variable length of a character is possible (a multi-byte strings). Length in this encoding is measured in bytes.
- OEM - default encoding of local system, similarly to ANSI, but OEM variant in Microsoft Windows is used.
It is strongly advised to not use ANSI and OEM encodings whenever possible and use one of the Unicode encodings instead. Those options are provided only for compatibility with older systems and data formats.
inFormat input is selecting a general string representation:
- VariableLength_Raw - variable length of string field without any additional prefixes and suffixes (just raw text in its character encoding is stored in the buffer).
- VariableLength_NullTerminated - variable length of string field with null termination character at its end (size of a null terminator is appropriate to the specified character encoding).
- VariableLength_8BitPrefix - variable length of string field with 8-bit prefix specifying the length of the encoded string. Resulting string length must not be longer than 255.
- VariableLength_16BitLEPrefix - variable length of string field with 16-bit little endian prefix specifying the length of the encoded string. Resulting string length must not be longer than 65535.
- VariableLength_32BitLEPrefix - variable length of string field with 32-bit little endian prefix specifying the length of the encoded string.
- VariableLength_16BitBEPrefix - variable length of string field with 16-bit big endian prefix specifying the length of the encoded string. Resulting string length must not be longer than 65535.
- VariableLength_32BitBEPrefix - variable length of string field with 32-bit big endian prefix specifying the length of the encoded string.
- VariableLength_LEB128Prefix - variable length of string field with variable length prefix stored using LEB128 coding (sometimes referred also as UTF-7 coding).
- FixedLength_Raw - resulting data field is always of fixed length, specified by the inMaxLength input (in units determined by the character encoding), when the actual text is shorter than the specified length it is padded with null codes.
- FixedLength_NullTerminated - resulting data field is always of fixed length, specified by the inMaxLength input (in units determined by the character encoding), when the actual text is shorter than the specified length it is padded with null codes, at least one null code must be always padded at the end of the text.
inMaxLength input specifies the maximum allowed length of the source string (before converting from the requested character encoding and including eventual null terminator). When the actual text length is greater than that (e.g. when null terminator is not found in proper range) an IOError will be raised. This input also specified the length of the data field when using fixed-length formats. When working with fixed-length formats the padding at the end of text is automatically removed, so resulting text can be shorter that the value specified in inMaxLength input.
When using the VariableLength_Raw format this filter will always read exactly the amount of source text code points as specified on the inMaxLength input. Because in this mode the filter cannot determine the length of the source string this value must be determined explicitly and passed on the inMaxLength input.
When the source string contains one or more characters that are invalid in the specified character encoding an IOError is raised.
Errors
This filter can throw an exception to report error. Read how to deal with errors in Error Handling.
List of possible exceptions:
Error type | Description |
---|---|
DomainError | ANSI and OEM encodings are not supported on Linux. |
DomainError | Not supported string binary format. |
DomainError | Not supported text encoding format. |
IoError | Empty byte buffer at input of ReadStringFromBuffer. |
IoError | Invalid format of string data. No null terminator within required range. |
IoError | Invalid format of string. Corrupted surrogate pair in UTF-16 string data. |
IoError | Invalid format of string. Invalid LEB128 encoded value. |
IoError | Invalid format of string. Invalid multi-byte character in string data. ANSI string encoding is selected and the read string contains character or characters that are invalid in coding of current system regional settings. |
IoError | Invalid format of string. Invalid UTF-8 sequence in string data. |
IoError | Invalid format of string. Non ASCII character in string data. |
IoError | Invalid format of string. Null character in middle of string data. |
IoError | Invalid format of string. Prefixed string length is too large. |
IoError | Invalid format of string. Reading beyond the end of the byte buffer. Dynamic length of the string stored at specified offset position spans beyond the end of the byte buffer. |
IoError | Invalid format of string. Resulting string length is too large. |
IoError | Reading beyond the end of the byte buffer. Source data range specified by the inOffset input and the size of string binary format spans beyond the end of the byte buffer. |
IoError | Source string length exceeds inMaxLength value. Dynamic length of the string stored at specified offset position is larger that the value on inMaxLength input. |
SystemError | Requested text encoding is not available in local system. |
Complexity Level
This filter is available on Basic Complexity Level.
Filter Group
This filter is member of ReadFromBuffer filter group.
See Also
- WriteStringToBuffer – Converts string value into specified binary representation and writes it to a byte buffer.