DataLogger class for writing data to a Feather file using Apache Arrow. More...
#include <data_logger.h>
Public Member Functions | |
| DataLogger ()=default | |
| Default constructor. | |
| DataLogger (int64_t flush_row_count) | |
| Constructor with explicit buffered batch size. | |
| virtual | ~DataLogger () |
| Destructor. | |
| bool | column_exists (const std::string &column_name) |
| Checks if the specified column exists. | |
| bool | column_value_already_set (const std::string &column_name) |
| Checks if the specified column value has been set for the current row. | |
| void | add_metadata (const std::string &key, const std::string &value) |
| Adds arbitrary string metadata to be embedded in the Feather file. | |
| void | add_field (const std::string &name, std::shared_ptr< arrow::DataType > type, bool ignore_existing_name=false) |
| Adds a new field to the schema. | |
| void | open_file (const std::string &filename) |
| Opens the output file for writing. | |
| void | set_value (const std::string &column_name, int64_t value) |
| Sets the value for a specified column in the current row. | |
| void | set_value (const std::string &column_name, int32_t value) |
| Sets the value for a specified column in the current row. | |
| void | set_value (const std::string &column_name, int16_t value) |
| Sets the value for a specified column in the current row. | |
| void | set_value (const std::string &column_name, int8_t value) |
| Sets the value for a specified column in the current row. | |
| void | set_value (const std::string &column_name, double value) |
| Sets the value for a specified column in the current row. | |
| void | set_value (const std::string &column_name, float value) |
| Sets the value for a specified column in the current row. | |
| void | set_value (const std::string &column_name, const std::string &value) |
| Sets the value for a specified column in the current row. | |
| void | set_value (const std::string &column_name, bool value) |
| Sets the value for a specified column in the current row. | |
| void | set_value_float16 (const std::string &column_name, float value) |
| Sets the value for a specified column in the current row. | |
| void | save_row () |
| Saves the current row to the Feather file. | |
| void | flush () |
| Flushes all currently buffered rows to the Feather file. | |
DataLogger class for writing data to a Feather file using Apache Arrow.
This class allows dynamic creation of a schema by adding fields prior to opening the file. Once the schema is defined, data rows can be built by setting individual column values and saved row-by-row into a Feather file using Zstd compression.
Internally, rows are buffered and written in larger record batches so that compression is applied on larger chunks of data, which significantly reduces file size and improves write performance compared with writing one record batch per row.
|
default |
Default constructor.
Uses a default buffered batch size.
|
explicit |
Constructor with explicit buffered batch size.
| flush_row_count | Number of rows to buffer before writing a record batch. |
| std::runtime_error | if flush_row_count is zero. |
|
virtual |
Destructor.
Flushes any remaining buffered rows, then closes the Feather writer and the output file if they are open. Warnings are logged if flushing or closing fails.
| void DataLogger::add_field | ( | const std::string & | name, |
| std::shared_ptr< arrow::DataType > | type, | ||
| bool | ignore_existing_name = false ) |
Adds a new field to the schema.
Adds a field with the specified name and data type to the internal schema. This must be called before the file is opened.
| name | The name of the field. |
| type | The Arrow data type for the field. |
| ignore_existing_name | Whether to ignore an existing field with the same name. If False, throw std::runtime_error. |
| std::runtime_error | if called after the file has been opened. |
| std::runtime_error | if ignore_existing_name==false and field name already exists |
| void DataLogger::add_metadata | ( | const std::string & | key, |
| const std::string & | value ) |
Adds arbitrary string metadata to be embedded in the Feather file.
This can be called any time before open_file(). If the same key is supplied more than once the value is overwritten.
| key | The metadata key (UTF-8, non-empty). |
| value | The metadata value (UTF-8, may be empty). |
| std::runtime_error | if the file has already been opened. |
| bool DataLogger::column_exists | ( | const std::string & | column_name | ) |
Checks if the specified column exists.
| column_name | The name of the column to check. |
| bool DataLogger::column_value_already_set | ( | const std::string & | column_name | ) |
Checks if the specified column value has been set for the current row.
| column_name | The name of the column to check. |
| std::runtime_error | if the file is not open or the column does not exist. |
| void DataLogger::flush | ( | ) |
Flushes all currently buffered rows to the Feather file.
If no rows are buffered, this is a no-op. This may be called manually, but is also called automatically by the destructor before closing the file.
| std::runtime_error | if finalizing arrays or writing the record batch fails. |
| void DataLogger::open_file | ( | const std::string & | filename | ) |
Opens the output file for writing.
Constructs the schema from the added fields, attaches custom metadata (such as the program version), ensures that the parent directory exists, and opens the file for writing. A Feather writer is then created with Zstd compression. Internal persistent builders are initialized, and the current row values are reset.
| filename | The path to the file to be written. |
| std::runtime_error | if no fields have been added or if file or writer initialization fails. |
| void DataLogger::save_row | ( | ) |
Saves the current row to the Feather file.
Appends the current row's values (or nulls if missing) to persistent Arrow array builders. Once enough rows have been accumulated, a record batch is finalized and written to the file. After appending, the row is reset to its default state.
| std::runtime_error | if any step in appending data or writing the record batch fails. |
| void DataLogger::set_value | ( | const std::string & | column_name, |
| bool | value ) |
Sets the value for a specified column in the current row.
Overloaded method for boolean values.
| column_name | The name of the column. |
| value | The boolean value to set. |
| std::runtime_error | if the file is not open or the column does not exist. |
| void DataLogger::set_value | ( | const std::string & | column_name, |
| const std::string & | value ) |
Sets the value for a specified column in the current row.
Overloaded method for string values.
| column_name | The name of the column. |
| value | The string value to set. |
| std::runtime_error | if the file is not open or the column does not exist. |
| void DataLogger::set_value | ( | const std::string & | column_name, |
| double | value ) |
Sets the value for a specified column in the current row.
Overloaded method for double values.
| column_name | The name of the column. |
| value | The double value to set. |
| std::runtime_error | if the file is not open or the column does not exist. |
| void DataLogger::set_value | ( | const std::string & | column_name, |
| float | value ) |
Sets the value for a specified column in the current row.
Overloaded method for float values.
| column_name | The name of the column. |
| value | The float value to set. |
| std::runtime_error | if the file is not open or the column does not exist. |
| void DataLogger::set_value | ( | const std::string & | column_name, |
| int16_t | value ) |
Sets the value for a specified column in the current row.
Overloaded method for int16_t values.
| column_name | The name of the column. |
| value | The int16_t value to set. |
| std::runtime_error | if the file is not open or the column does not exist. |
| void DataLogger::set_value | ( | const std::string & | column_name, |
| int32_t | value ) |
Sets the value for a specified column in the current row.
Overloaded method for int32_t values.
| column_name | The name of the column. |
| value | The int32_t value to set. |
| std::runtime_error | if the file is not open or the column does not exist. |
| void DataLogger::set_value | ( | const std::string & | column_name, |
| int64_t | value ) |
Sets the value for a specified column in the current row.
Overloaded method for int64_t values.
| column_name | The name of the column. |
| value | The int64_t value to set. |
| std::runtime_error | if the file is not open or the column does not exist. |
| void DataLogger::set_value | ( | const std::string & | column_name, |
| int8_t | value ) |
Sets the value for a specified column in the current row.
Overloaded method for int8_t values.
| column_name | The name of the column. |
| value | The int8_t value to set. |
| std::runtime_error | if the file is not open or the column does not exist. |
| void DataLogger::set_value_float16 | ( | const std::string & | column_name, |
| float | value ) |
Sets the value for a specified column in the current row.
Not overloaded method (like set_value methods), to avoid automatic promotions to double
| column_name | The name of the column. |
| value | The float value to convert to Arrow float16 raw payload. |
| std::runtime_error | if the file is not open or the column does not exist. |