Pogosim
Loading...
Searching...
No Matches
DataLogger Class Reference

DataLogger class for writing data to a Feather file using Apache Arrow. More...

#include <data_logger.h>

Public Member Functions

 DataLogger ()=default
 Default constructor.
 DataLogger (int64_t flush_row_count)
 Constructor with explicit buffered batch size.
virtual ~DataLogger ()
 Destructor.
bool column_exists (const std::string &column_name)
 Checks if the specified column exists.
bool column_value_already_set (const std::string &column_name)
 Checks if the specified column value has been set for the current row.
void add_metadata (const std::string &key, const std::string &value)
 Adds arbitrary string metadata to be embedded in the Feather file.
void add_field (const std::string &name, std::shared_ptr< arrow::DataType > type, bool ignore_existing_name=false)
 Adds a new field to the schema.
void open_file (const std::string &filename)
 Opens the output file for writing.
void set_value (const std::string &column_name, int64_t value)
 Sets the value for a specified column in the current row.
void set_value (const std::string &column_name, int32_t value)
 Sets the value for a specified column in the current row.
void set_value (const std::string &column_name, int16_t value)
 Sets the value for a specified column in the current row.
void set_value (const std::string &column_name, int8_t value)
 Sets the value for a specified column in the current row.
void set_value (const std::string &column_name, double value)
 Sets the value for a specified column in the current row.
void set_value (const std::string &column_name, float value)
 Sets the value for a specified column in the current row.
void set_value (const std::string &column_name, const std::string &value)
 Sets the value for a specified column in the current row.
void set_value (const std::string &column_name, bool value)
 Sets the value for a specified column in the current row.
void set_value_float16 (const std::string &column_name, float value)
 Sets the value for a specified column in the current row.
void save_row ()
 Saves the current row to the Feather file.
void flush ()
 Flushes all currently buffered rows to the Feather file.

Detailed Description

DataLogger class for writing data to a Feather file using Apache Arrow.

This class allows dynamic creation of a schema by adding fields prior to opening the file. Once the schema is defined, data rows can be built by setting individual column values and saved row-by-row into a Feather file using Zstd compression.

Internally, rows are buffered and written in larger record batches so that compression is applied on larger chunks of data, which significantly reduces file size and improves write performance compared with writing one record batch per row.

Constructor & Destructor Documentation

◆ DataLogger() [1/2]

DataLogger::DataLogger ( )
default

Default constructor.

Uses a default buffered batch size.

◆ DataLogger() [2/2]

DataLogger::DataLogger ( int64_t flush_row_count)
explicit

Constructor with explicit buffered batch size.

Parameters
flush_row_countNumber of rows to buffer before writing a record batch.
Exceptions
std::runtime_errorif flush_row_count is zero.

◆ ~DataLogger()

DataLogger::~DataLogger ( )
virtual

Destructor.

Flushes any remaining buffered rows, then closes the Feather writer and the output file if they are open. Warnings are logged if flushing or closing fails.

Member Function Documentation

◆ add_field()

void DataLogger::add_field ( const std::string & name,
std::shared_ptr< arrow::DataType > type,
bool ignore_existing_name = false )

Adds a new field to the schema.

Adds a field with the specified name and data type to the internal schema. This must be called before the file is opened.

Parameters
nameThe name of the field.
typeThe Arrow data type for the field.
ignore_existing_nameWhether to ignore an existing field with the same name. If False, throw std::runtime_error.
Exceptions
std::runtime_errorif called after the file has been opened.
std::runtime_errorif ignore_existing_name==false and field name already exists

◆ add_metadata()

void DataLogger::add_metadata ( const std::string & key,
const std::string & value )

Adds arbitrary string metadata to be embedded in the Feather file.

This can be called any time before open_file(). If the same key is supplied more than once the value is overwritten.

Parameters
keyThe metadata key (UTF-8, non-empty).
valueThe metadata value (UTF-8, may be empty).
Exceptions
std::runtime_errorif the file has already been opened.

◆ column_exists()

bool DataLogger::column_exists ( const std::string & column_name)

Checks if the specified column exists.

Parameters
column_nameThe name of the column to check.
Returns
whether the column exists

◆ column_value_already_set()

bool DataLogger::column_value_already_set ( const std::string & column_name)

Checks if the specified column value has been set for the current row.

Parameters
column_nameThe name of the column to check.
Returns
whether the column value has been set for the current row
Exceptions
std::runtime_errorif the file is not open or the column does not exist.

◆ flush()

void DataLogger::flush ( )

Flushes all currently buffered rows to the Feather file.

If no rows are buffered, this is a no-op. This may be called manually, but is also called automatically by the destructor before closing the file.

Exceptions
std::runtime_errorif finalizing arrays or writing the record batch fails.

◆ open_file()

void DataLogger::open_file ( const std::string & filename)

Opens the output file for writing.

Constructs the schema from the added fields, attaches custom metadata (such as the program version), ensures that the parent directory exists, and opens the file for writing. A Feather writer is then created with Zstd compression. Internal persistent builders are initialized, and the current row values are reset.

Parameters
filenameThe path to the file to be written.
Exceptions
std::runtime_errorif no fields have been added or if file or writer initialization fails.

◆ save_row()

void DataLogger::save_row ( )

Saves the current row to the Feather file.

Appends the current row's values (or nulls if missing) to persistent Arrow array builders. Once enough rows have been accumulated, a record batch is finalized and written to the file. After appending, the row is reset to its default state.

Exceptions
std::runtime_errorif any step in appending data or writing the record batch fails.

◆ set_value() [1/8]

void DataLogger::set_value ( const std::string & column_name,
bool value )

Sets the value for a specified column in the current row.

Overloaded method for boolean values.

Parameters
column_nameThe name of the column.
valueThe boolean value to set.
Exceptions
std::runtime_errorif the file is not open or the column does not exist.

◆ set_value() [2/8]

void DataLogger::set_value ( const std::string & column_name,
const std::string & value )

Sets the value for a specified column in the current row.

Overloaded method for string values.

Parameters
column_nameThe name of the column.
valueThe string value to set.
Exceptions
std::runtime_errorif the file is not open or the column does not exist.

◆ set_value() [3/8]

void DataLogger::set_value ( const std::string & column_name,
double value )

Sets the value for a specified column in the current row.

Overloaded method for double values.

Parameters
column_nameThe name of the column.
valueThe double value to set.
Exceptions
std::runtime_errorif the file is not open or the column does not exist.

◆ set_value() [4/8]

void DataLogger::set_value ( const std::string & column_name,
float value )

Sets the value for a specified column in the current row.

Overloaded method for float values.

Parameters
column_nameThe name of the column.
valueThe float value to set.
Exceptions
std::runtime_errorif the file is not open or the column does not exist.

◆ set_value() [5/8]

void DataLogger::set_value ( const std::string & column_name,
int16_t value )

Sets the value for a specified column in the current row.

Overloaded method for int16_t values.

Parameters
column_nameThe name of the column.
valueThe int16_t value to set.
Exceptions
std::runtime_errorif the file is not open or the column does not exist.

◆ set_value() [6/8]

void DataLogger::set_value ( const std::string & column_name,
int32_t value )

Sets the value for a specified column in the current row.

Overloaded method for int32_t values.

Parameters
column_nameThe name of the column.
valueThe int32_t value to set.
Exceptions
std::runtime_errorif the file is not open or the column does not exist.

◆ set_value() [7/8]

void DataLogger::set_value ( const std::string & column_name,
int64_t value )

Sets the value for a specified column in the current row.

Overloaded method for int64_t values.

Parameters
column_nameThe name of the column.
valueThe int64_t value to set.
Exceptions
std::runtime_errorif the file is not open or the column does not exist.

◆ set_value() [8/8]

void DataLogger::set_value ( const std::string & column_name,
int8_t value )

Sets the value for a specified column in the current row.

Overloaded method for int8_t values.

Parameters
column_nameThe name of the column.
valueThe int8_t value to set.
Exceptions
std::runtime_errorif the file is not open or the column does not exist.

◆ set_value_float16()

void DataLogger::set_value_float16 ( const std::string & column_name,
float value )

Sets the value for a specified column in the current row.

Not overloaded method (like set_value methods), to avoid automatic promotions to double

Parameters
column_nameThe name of the column.
valueThe float value to convert to Arrow float16 raw payload.
Exceptions
std::runtime_errorif the file is not open or the column does not exist.

The documentation for this class was generated from the following files: