SHOGUN  6.0.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules
List of all members | Public Member Functions | Public Attributes | Protected Attributes
CInputParser< T > Class Template Reference

Detailed Description

template<class T>
class shogun::CInputParser< T >

Class CInputParser is a templated class used to maintain the reading/parsing/providing of examples.

Parsing is done in a thread separate from the learner.

Note that parsing is not done directly by this class, but by the Streaming*File classes. This class only calls the required get_vector* functions from the StreamingFile object. (Exactly which function should be called is set through the set_read_vector* functions)

The template type should be the type of feature vector the parser should return. Eg. CInputParser<float32_t> means it will expect a float32_t* vector to be returned from the get_vector function. Other parameters returned are length of feature vector and the label, if applicable.

If the vectors cannot be directly represented as say float32_t* one can instantiate eg. CInputParser<VwExample> and it looks for a get_vector function which returns a VwExample, which may contain any kind of data, including label, vector, weights, etc. It is then up to the external algorithm to handle such objects.

The parser should first be started with a call to the start_parser() function which starts a new thread for continuous parsing of examples.

Parsing is done through the CParseBuffer object, which in its current implementation is a ring of a specified number of examples. It is the task of the CInputParser object to ensure that this ring is being updated with new parsed examples.

CInputParser provides mainly the get_next_example function which returns the next example from the CParseBuffer object to the caller (usually a StreamingFeatures object). When one is done using the example, finalize_example() should be called, leaving the spot free for a new example to be loaded.

The parsing thread should be joined with a call to end_parser(). exit_parser() may be used to cancel the parse thread if needed.

Options are provided for automatic SG_FREEing of example objects after each finalize_example() and also on CInputParser destruction. They are set through the set_free_vector* functions. Do not free vectors on finalize_example() if you intend to reuse the same vector memory locations for different examples. Do not free vectors on destruction if you are bound to free them manually later.

Definition at line 85 of file InputParser.h.

Public Member Functions

 CInputParser ()
 
 ~CInputParser ()
 
void init (CStreamingFile *input_file, bool is_labelled=true, int32_t size=PARSER_DEFAULT_BUFFSIZE)
 
bool is_running ()
 
int32_t get_number_of_features ()
 
void set_read_vector (void(CStreamingFile::*func_ptr)(T *&vec, int32_t &len))
 
void set_read_vector_and_label (void(CStreamingFile::*func_ptr)(T *&vec, int32_t &len, float64_t &label))
 
int32_t get_vector_and_label (T *&feature_vector, int32_t &length, float64_t &label)
 
int32_t get_vector_only (T *&feature_vector, int32_t &length)
 
void set_free_vector_after_release (bool free_vec)
 
void set_free_vectors_on_destruct (bool destroy)
 
void start_parser ()
 
void * main_parse_loop (void *params)
 
void copy_example_into_buffer (Example< T > *ex)
 
Example< T > * retrieve_example ()
 
int32_t get_next_example (T *&feature_vector, int32_t &length, float64_t &label)
 
int32_t get_next_example (T *&feature_vector, int32_t &length)
 
void finalize_example ()
 
void end_parser ()
 
void exit_parser ()
 
int32_t get_ring_size ()
 

Public Attributes

bool parsing_done
 
bool reading_done
 
E_EXAMPLE_TYPE example_type
 

Protected Attributes

void(CStreamingFile::* read_vector )(T *&vec, int32_t &len)
 
void(CStreamingFile::* read_vector_and_label )(T *&vec, int32_t &len, float64_t &label)
 
CStreamingFileinput_source
 Input source, CStreamingFile object. More...
 
std::thread parse_thread
 Thread in which the parser runs. More...
 
CParseBuffer< T > * examples_ring
 The ring of examples, stored as they are parsed. More...
 
int32_t number_of_features
 Number of features in dataset (max of 'seen' features upto point of access) More...
 
int32_t number_of_vectors_parsed
 Number of vectors parsed. More...
 
int32_t number_of_vectors_read
 Number of vectors used by external algorithm. More...
 
Example< T > * current_example
 Example currently being used. More...
 
T * current_feature_vector
 Feature vector of current example. More...
 
float64_t current_label
 Label of current example. More...
 
int32_t current_len
 Number of features in current example. More...
 
bool free_after_release
 Whether to SG_FREE() vector after it is used. More...
 
int32_t ring_size
 Size of the ring of examples. More...
 
std::mutex examples_state_lock
 Mutex which is used when getting/setting state of examples (whether a new example is ready) More...
 
std::condition_variable examples_state_changed
 Condition variable to indicate change of state of examples. More...
 
std::atomic_bool keep_running
 Flag that indicate that the parsing thread should continue reading. More...
 

Constructor & Destructor Documentation

Constructor

Definition at line 379 of file InputParser.h.

Destructor

Definition at line 388 of file InputParser.h.

Member Function Documentation

void copy_example_into_buffer ( Example< T > *  ex)

Copy example into the buffer.

Parameters
exExample to be copied.

Definition at line 509 of file InputParser.h.

void end_parser ( )

End the parser, waiting for the parse thread to complete.

Definition at line 651 of file InputParser.h.

void exit_parser ( )

Terminates the parsing thread

Definition at line 660 of file InputParser.h.

void finalize_example ( )

Finalize the current example, indicating that the buffer position it occupies may be overwritten by the parser.

Should be called when the example has been processed by the external algorithm.

Definition at line 646 of file InputParser.h.

int32_t get_next_example ( T *&  feature_vector,
int32_t &  length,
float64_t label 
)

Gets the next example, assuming it to be labelled.

Waits till retrieve_example returns a valid example, or returns if reading is done already.

Parameters
feature_vectorFeature vector pointer
lengthLength of feature vector
labelLabel of example
Returns
1 if an example could be fetched, 0 otherwise

Definition at line 591 of file InputParser.h.

int32_t get_next_example ( T *&  feature_vector,
int32_t &  length 
)

Gets the next example, assuming it to be unlabelled.

Parameters
feature_vector
length
Returns
1 if an example could be fetched, 0 otherwise

Definition at line 638 of file InputParser.h.

int32_t get_number_of_features ( )

Get number of features from example. Currently reads first line of input to infer.

Returns
Number of features

Definition at line 127 of file InputParser.h.

int32_t get_ring_size ( )

Returns the size of the examples ring

Returns
ring size in terms of number of examples

Definition at line 282 of file InputParser.h.

int32_t get_vector_and_label ( T *&  feature_vector,
int32_t &  length,
float64_t label 
)

Gets feature vector, length and label. Sets their values by reference. Uses method for reading the vector defined in CStreamingFile.

Parameters
feature_vectorPointer to feature vector
lengthFeatures in vector
labelLabel of example
Returns
1 on success, 0 on failure.

Definition at line 478 of file InputParser.h.

int32_t get_vector_only ( T *&  feature_vector,
int32_t &  length 
)

Gets feature vector and length by reference. Assumes examples are unlabelled. Uses method for reading the vector defined in CStreamingFile.

Parameters
feature_vectorPointer to feature vector
lengthFeatures in vector
Returns
1 on success, 0 on failure

Definition at line 494 of file InputParser.h.

void init ( CStreamingFile input_file,
bool  is_labelled = true,
int32_t  size = PARSER_DEFAULT_BUFFSIZE 
)

Initializer

Sets initial or default values for members. is_example_used is initialized to EMPTY. example_type is LABELLED by default.

Parameters
input_fileCStreamingFile object
is_labelledWhether example is labelled or not (bool), optional
sizeSize of the buffer in number of examples

Definition at line 394 of file InputParser.h.

bool is_running ( )

Test if parser is running.

Returns
true if running, false otherwise.

Definition at line 459 of file InputParser.h.

void * main_parse_loop ( void *  params)

Main parsing loop. Reads examples from source and stores them in the buffer.

Parameters
params'this' object
Returns
NULL

Definition at line 514 of file InputParser.h.

Example< T > * retrieve_example ( )

Retrieves the next example from the buffer.

Returns
The example pointer.

Definition at line 561 of file InputParser.h.

void set_free_vector_after_release ( bool  free_vec)

Sets whether to SG_FREE() the vector explicitly after it has been used

Parameters
free_vecwhether to SG_FREE() or not, bool

Definition at line 421 of file InputParser.h.

void set_free_vectors_on_destruct ( bool  destroy)

Sets whether to free all vectors that were allocated in the ring upon destruction of the ring.

Parameters
destroyfree all vectors on destruction

Definition at line 427 of file InputParser.h.

void set_read_vector ( void(CStreamingFile::*)(T *&vec, int32_t &len)  func_ptr)

Sets the function used for reading a vector from the file.

The function must be a member of CStreamingFile, taking a T* as input for the vector, and an int for length, setting both by reference. The function returns void.

The argument is a function pointer to that function.

Definition at line 365 of file InputParser.h.

void set_read_vector_and_label ( void(CStreamingFile::*)(T *&vec, int32_t &len, float64_t &label)  func_ptr)

Sets the function used for reading a vector and label from the file.

The function must be a member of CStreamingFile, taking a T* as input for the vector, an int for length, and a float for the label, setting all by reference. The function returns void.

The argument is a function pointer to that function.

Definition at line 372 of file InputParser.h.

void start_parser ( )

Starts the parser, creating a new thread.

main_parse_loop is the parsing method.

Definition at line 433 of file InputParser.h.

Member Data Documentation

Example<T>* current_example
protected

Example currently being used.

Definition at line 336 of file InputParser.h.

T* current_feature_vector
protected

Feature vector of current example.

Definition at line 339 of file InputParser.h.

float64_t current_label
protected

Label of current example.

Definition at line 342 of file InputParser.h.

int32_t current_len
protected

Number of features in current example.

Definition at line 345 of file InputParser.h.

E_EXAMPLE_TYPE example_type

LABELLED or UNLABELLED

Definition at line 298 of file InputParser.h.

CParseBuffer<T>* examples_ring
protected

The ring of examples, stored as they are parsed.

Definition at line 324 of file InputParser.h.

std::condition_variable examples_state_changed
protected

Condition variable to indicate change of state of examples.

Definition at line 357 of file InputParser.h.

std::mutex examples_state_lock
protected

Mutex which is used when getting/setting state of examples (whether a new example is ready)

Definition at line 354 of file InputParser.h.

bool free_after_release
protected

Whether to SG_FREE() vector after it is used.

Definition at line 348 of file InputParser.h.

CStreamingFile* input_source
protected

Input source, CStreamingFile object.

Definition at line 318 of file InputParser.h.

std::atomic_bool keep_running
protected

Flag that indicate that the parsing thread should continue reading.

Definition at line 360 of file InputParser.h.

int32_t number_of_features
protected

Number of features in dataset (max of 'seen' features upto point of access)

Definition at line 327 of file InputParser.h.

int32_t number_of_vectors_parsed
protected

Number of vectors parsed.

Definition at line 330 of file InputParser.h.

int32_t number_of_vectors_read
protected

Number of vectors used by external algorithm.

Definition at line 333 of file InputParser.h.

std::thread parse_thread
protected

Thread in which the parser runs.

Definition at line 321 of file InputParser.h.

bool parsing_done

true if all input is parsed

Definition at line 295 of file InputParser.h.

void(CStreamingFile::* read_vector)(T *&vec, int32_t &len)
protected

This is the function pointer to the function to read a vector from the input.

It is called while reading a vector.

Definition at line 307 of file InputParser.h.

void(CStreamingFile::* read_vector_and_label)(T *&vec, int32_t &len, float64_t &label)
protected

This is the function pointer to the function to read a vector and label from the input.

It is called while reading a vector and a label.

Definition at line 315 of file InputParser.h.

bool reading_done

true if all examples are fetched

Definition at line 296 of file InputParser.h.

int32_t ring_size
protected

Size of the ring of examples.

Definition at line 351 of file InputParser.h.


The documentation for this class was generated from the following file:

SHOGUN Machine Learning Toolbox - Documentation