The BAPS3 internal API sits atop a text-based protocol based upon POSIX shell conventions.
We use a shell-style protocol because:
It is lightweight and relatively easy to parse;
It is a good match for the command-and-arguments style of the BAPS3 API;
The shell style of escaping is convenient for escaping paths
(especially on Windows, where the single-quote syntax can avoid
needing to escape every backslash in a Windows path), which are
common arguments to BAPS3 commands (load
, enqueue
, and so on).
A disadvantage is that it is impossible to perform a context-free splitting of input into lines; the line-feed character could be escaped. However, since the usual transport used by the BAPS3 protocol is TCP, and most TCP libraries provide unbuffered input in chunks which may be smaller or larger than one line, we rarely need this luxury.
The internal API is designed to expect requests and responses in the UTF-8 encoding. However, to allow simple implementations that operate on a per-byte level, certain uses of UTF-8 beyond the single-byte (ASCII) range are discouraged.
API users must support the sending and receiving of all single-byte UTF-8 character codes (that is, ASCII), and should support the sending and receiving of the full UTF-8 range subject to the limitations below.
It is undefined behaviour to:
Use a whitespace character not in the single-byte subset of UTF-8 to separate words. (See below for recommendations on word-separating whitespace.) This limitation allows implementations to make use of single-byte whitespace checks for word separation;
Backslash-escape a character not in single-byte UTF-8. (This should not be necessary, as single- and double-quote modes will allow these characters to be used without escape.) This limitation allows backslash escaping to operate on a single-byte level;
Backslash-escape the line-feed character. This limitation is due to the special status of line-feed escaping as line continuation in shell tokenisation, and may change in future when a decision is made as to the necessity of keeping this status in the API.
API users may make decisions as to how to interpret said undefined
behaviour, but should not expect certain behaviour from other
users without consulting the FEATURES
flags and OHAI
response
for confirmation of said behaviour.
Most characters may be transmitted verbatim via the protocol. However, in order to allow separation of protocol communications into the words and commands defined below, BAPS3 gives special meaning to certain characters. This is done according to four quote modes, specified below.
Note
|
In case of ambiguity, refer to the POSIX shell quoting standards: we follow that style of quoting with the exception of disallowing variable and command interpolation (thus, backtick and dollar are not considered special). |
A protocol tokeniser should start in unquoted mode. In this mode:
Any run of whitespace characters (see below) separates one word from the next;
A line-feed character (0x0A
) separates one command from the
next, and ends any word preceding;
A single-quote character ('
) begins single-quoted mode;
A double-quote character ("
) begins double-quoted mode;
A backslash character (\
) begins escaped mode;
Any other character is echoed verbatim.
The whitespace characters permitted for word separation must not
contain line-feed (0x0A
), and must contain space (0x20
) and
horizontal tab (0x09
). Consequently, API users should use the
latter two characters for word separation. API users should not
use multi-byte whitespace characters, for the reasons provided in
Encoding.
For implementations in C or C++, the C function isspace() may be used to identify word-separating whitespace, so long as line-feed is not interpreted as such.
In single-quoted mode, only the single-quote character is treated specially, and has the effect of returning to unquoted mode.
In double-quoted mode:
A backslash character begins escaped mode;
A double-quote character begins unquoted mode;
Any other character is echoed verbatim.
In escaped mode, any single character is treated verbatim; the mode then reverts to that in which the backslash beginning escaped mode was read.
To allow implementations which escape single bytes instead of characters, clients should not expect servers to backslash-escape multiple-byte characters properly, and should instead use single-quoted or double-quoted mode to escape these characters.
The main protocol element is the word, which is a well-formed sequence of one or more raw characters delimited by any non-zero run of unquoted whitespace. A well-formed word is one that ends in unquoted mode (quotes must be matched), and does not end with a backslash.
By raw characters, we mean that the criteria for a valid word hold
before tokenisation, and specifically that the special characters
count towards the one-or-more limit. This means that, while the empty
string is not a valid word, the quoted 'empty' string ''
is.
Implicitly, the unquoted whitespace characters are not considered to
count towards any words.
Note
|
There is no limit on the number of quote mode transitions
inside a single word. For example, unquoted"double"unquoted’single'\
unquoted should be considered one word (albeit a pathological
one).
|
Each communication over the Internal API consists of a command of one or more words. The first, mandatory, word is the command word and unambiguously identifies the command; any further words are arguments to that command. In this respect, the BAPS3 protocol perfectly mirrors shell usage.
Commands are divided into requests and responses.
See the Internal API compliance tests.