URY playd
C++ minimalist audio player
Tokeniser Class Reference

A string tokeniser. More...

#include <tokeniser.hpp>

+ Collaboration diagram for Tokeniser:

Public Member Functions

 Tokeniser ()
 Constructs a new Tokeniser.
 
std::vector< std::vector< std::string > > Feed (const std::string &raw)
 Feeds a string into a Tokeniser. More...
 

Private Types

enum  QuoteType : std::uint8_t { QuoteType::NONE, QuoteType::SINGLE, QuoteType::DOUBLE }
 Enumeration of quotation types. More...
 

Private Member Functions

void Emit ()
 Finishes the current word and sends the line to the CommandHandler.
 
void EndWord ()
 Finishes the current word, adding it to the tokenised line.
 
void Push (char c)
 Pushes a raw character onto the end of the current word. More...
 

Private Attributes

std::vector< std::vector< std::string > > ready_lines
 The current vector of completed, tokenised lines. More...
 
std::vector< std::string > words
 The current vector of completed, tokenised words.
 
std::string current_word
 The current, incomplete word to which new characters should be added. More...
 
bool escape_next
 Whether the next character is to be interpreted as an escape code. More...
 
bool in_word
 Whether the tokeniser is currently in a word.
 
QuoteType quote_type
 The type of quotation currently being used in this Tokeniser.
 

Detailed Description

A string tokeniser.

A Tokeniser is fed chunks of incoming data from the IO system, and emits any fully-formed command lines it encounters to the command handler.

See also
CommandHandler
IoCore

Definition at line 24 of file tokeniser.hpp.

Member Enumeration Documentation

§ QuoteType

enum Tokeniser::QuoteType : std::uint8_t
strongprivate

Enumeration of quotation types.

Enumerator
NONE 

Not currently in a quote pair.

SINGLE 

In single quotes ('').

DOUBLE 

In double quotes ("").

Definition at line 42 of file tokeniser.hpp.

42  : std::uint8_t {
43  NONE,
44  SINGLE,
45  DOUBLE
46  };

Member Function Documentation

§ Feed()

std::vector< std::vector< std::string > > Tokeniser::Feed ( const std::string &  raw)

Feeds a string into a Tokeniser.

Parameters
rawConst reference to the raw string to feed. The string need not contain complete lines.
Returns
The vector of lines that have been successfully tokenised in this tokenising pass. This vector may be empty.
Note
Escaping a multi-byte UTF-8 character is undefined behaviour.

Definition at line 24 of file tokeniser.cpp.

References DOUBLE, Emit(), EndWord(), escape_next, in_word, NONE, Push(), quote_type, ready_lines, and SINGLE.

Referenced by Connection::Read().

25 {
26  // The list of ready lines should be cleared by any previous Feed.
27  assert(this->ready_lines.empty());
28 
29  for (char c : raw) {
30  if (this->escape_next) {
31  this->Push(c);
32  continue;
33  }
34 
35  switch (this->quote_type) {
36  case QuoteType::SINGLE:
37  if (c == '\'') {
39  } else {
40  this->Push(c);
41  }
42  break;
43 
44  case QuoteType::DOUBLE:
45  switch (c) {
46  case '\"':
47  this->quote_type =
49  break;
50 
51  case '\\':
52  this->escape_next = true;
53  break;
54 
55  default:
56  this->Push(c);
57  break;
58  }
59  break;
60 
61  case QuoteType::NONE:
62  switch (c) {
63  case '\n':
64  this->Emit();
65  break;
66 
67  case '\'':
68  this->in_word = true;
69  this->quote_type =
71  break;
72 
73  case '\"':
74  this->in_word = true;
75  this->quote_type =
77  break;
78 
79  case '\\':
80  this->escape_next = true;
81  break;
82 
83  default:
84  isspace(c) ? this->EndWord()
85  : this->Push(c);
86  break;
87  }
88  break;
89  }
90  }
91 
92  auto lines = this->ready_lines;
93  this->ready_lines.clear();
94 
95  return lines;
96 }
bool escape_next
Whether the next character is to be interpreted as an escape code.
Definition: tokeniser.hpp:61
void Push(char c)
Pushes a raw character onto the end of the current word.
Definition: tokeniser.cpp:98
QuoteType quote_type
The type of quotation currently being used in this Tokeniser.
Definition: tokeniser.hpp:67
void EndWord()
Finishes the current word, adding it to the tokenised line.
Definition: tokeniser.cpp:108
void Emit()
Finishes the current word and sends the line to the CommandHandler.
Definition: tokeniser.cpp:119
bool in_word
Whether the tokeniser is currently in a word.
Definition: tokeniser.hpp:64
In single quotes (&#39;&#39;).
Not currently in a quote pair.
std::vector< std::vector< std::string > > ready_lines
The current vector of completed, tokenised lines.
Definition: tokeniser.hpp:50
In double quotes ("").

§ Push()

void Tokeniser::Push ( char  c)
private

Pushes a raw character onto the end of the current word.

This also clears the escape_next flag.

Parameters
cThe character to push onto the current word.

Definition at line 98 of file tokeniser.cpp.

References current_word, escape_next, in_word, NONE, and quote_type.

Referenced by Feed().

99 {
100  assert(this->escape_next ||
101  !(this->quote_type == QuoteType::NONE && isspace(c)));
102  this->in_word = true;
103  this->current_word.push_back(c);
104  this->escape_next = false;
105  assert(!this->current_word.empty());
106 }
bool escape_next
Whether the next character is to be interpreted as an escape code.
Definition: tokeniser.hpp:61
QuoteType quote_type
The type of quotation currently being used in this Tokeniser.
Definition: tokeniser.hpp:67
bool in_word
Whether the tokeniser is currently in a word.
Definition: tokeniser.hpp:64
Not currently in a quote pair.
std::string current_word
The current, incomplete word to which new characters should be added.
Definition: tokeniser.hpp:57

Member Data Documentation

§ current_word

std::string Tokeniser::current_word
private

The current, incomplete word to which new characters should be added.

Definition at line 57 of file tokeniser.hpp.

Referenced by Emit(), EndWord(), and Push().

§ escape_next

bool Tokeniser::escape_next
private

Whether the next character is to be interpreted as an escape code.

This usually gets set to true when a backslash is detected.

Definition at line 61 of file tokeniser.hpp.

Referenced by Emit(), Feed(), and Push().

§ ready_lines

std::vector<std::vector<std::string> > Tokeniser::ready_lines
private

The current vector of completed, tokenised lines.

This is cleared at the end of every Tokeniser::Feed.

Definition at line 50 of file tokeniser.hpp.

Referenced by Emit(), and Feed().


The documentation for this class was generated from the following files: