Building a type-safe Parser Combinator Library in Modern C++
What are Parser Combinators??
Instead of building one large monolithic parser like a recursive descent parser, parser combinators let you construct complex parsers by composing small, reusable functions.
Each parser:
consumes input
either succeeds with a value + remaining input
or fails with an error
The Core Idea
We write tiny parsers:
Parser A → parses 'a'
Parser B → parses 'b'
At this stage, nothing is composed yet.
Then we combine them:
Parser A Parser B
(parse 'a') (parse 'b')
│ │
└───────────┬────────────┘
↓
Sequence Combinator(>>)
↓
Combined Parser (P)
Execution Model
Let’s run the composed parser:
Input:
"ab"
Flow:
Combined Parser P
↓
Parser A → consumes 'a'
↓
Parser B → consumes 'b'
Output:
['a', 'b']
Composing Parsers with Combinators
Now that we understand how parsers are built and executed, the next question is:
How do we actually combine small parsers to build complex ones?
This is where parser combinators come in
Types of Combinators
Sequence Combinator (>>):
It runs two parsers one after another.
First parser consumes part of the input
Second parser continues from where the first stopped
Final result combines both outputs
A >> B
Choice Combinator (|):
Choice combinator tries parsers one by one until one succeeds.
Order matters
First successful parser wins
Input: "b"
A = parse 'a' → fails
B = parse 'b' → succeeds
Optional Combinator:
Represents something that may or may not exist.
success → returns value
failure → empty result
Input: "abc"
Optional(parse 'z') → None
Repetition Combinator:
Matches a parser repeatedly until it stops making progress.
Stops when:
parser fails
or no progress is made
Input: "bbba"
↓
Parser 'b' runs repeatedly
↓
Result: ['b', 'b', 'b']
Discard/Skip combinator:
Consumes input but ignores output.
Used for:
spaces
separators
punctuation
The core abstraction: Parser
A Parser is simply a wrapper around a function that:
takes a sequence of input tokens
processes some portion of that input
returns a Result, which contains either:
a parsed value along with how much input was consumed, or
an error indicating why parsing failed
template <class T, class Tok, class Fn>
class Parser {
public:
using result_type = T;
using token_type = Tok;
constexpr Parser(Fn&& fn): _apply(std::forward<Fn>(fn)) {}
constexpr Result<T> parse(std::span<Tok const> tokens) const {
return _apply(tokens);
}
private:
Fn _apply;
};
Why do we need a Result type??
Parsing always has two possible outcomes:
success
failure
Why not just return
T?
If we only return a value:
char parse(...);
There is no way to represent failure.
Why not
std::optional<T>?
This allows failure, but still misses two critical things:
No error information
No input consumption tracking
Result Type
We use tagged Result type:
Result<T> =
Success(T value, index)
OR
Failure(ParseError error, index)
class ParseError {
public:
constexpr ParseError(const char* error): _msg(error) {}
constexpr const char* error() { return _msg; }
private:
const char* _msg;
};
template <class T>
class Result {
public:
using value_type = T;
constexpr Result(T v, uint32_t idx)
: _ok(true), _value(v), _idx(idx) {}
constexpr Result(ParseError e, uint32_t idx)
: _ok(false), _error(e), _idx(idx) {}
constexpr bool is_ok() const { return _ok; }
constexpr bool is_error() const { return !_ok; }
constexpr T const& ok() const { return _value; }
constexpr ParseError const& error() const { return _error; }
constexpr uint32_t index() const { return _idx; }
private:
bool _ok;
T _value{};
ParseError _error{""};
uint32_t _idx;
};
Implementing Combinators
Now that we have:
a Parser abstraction
a Result type
we can start building combinators, functions that take parsers and return new parsers.
Sequence Combinator (>>):
template <class A, class B> constexpr auto Sequence(A&& a, B&& b) { using T1 = ValueOf<A>; using T2 = ValueOf<B>; using Tok = TokenOf<A>; using T = flatten_tuple_t<std::tuple<T1, T2>>; return parser<T, Tok>([=](std::span<Tok const> tokens) constexpr -> Result <T> { auto res1 = a.parse(tokens); if (res1.is_error()) { return {res1.error(), res1.index()}; } auto res2 = b.parse(tokens.subspan(res1.index())); if (res2.is_error()) { return {res2.error(), res1.index() + res2.index()}; } return Result<T>( std::tuple_cat( as_tuple(res1.ok()), as_tuple(res2.ok()) ), res1.index() + res2.index() ); }); } template <class A, class B> constexpr auto operator>>(A&& a, B&& b) { return Sequence(std::forward<A>(a), std::forward<B>(b)); };💡A sequence combinator must return a result whose type is known at compile time. Using something likestd::vectorloses type information and isn’t fully compile-time safe. Instead, we usestd::tupleto preserve exact types.Why do we flatten tuples?? (type-level)
The sequence combinator can be applied multiple times:
p1>>p2>>p3>>p4p1 >> p2producesstd::tuple<T1, T2>Then we apply p3:
(p1>>p2)>>p3. If we combine the resultant types, the type becomes:std::tuple<std::tuple<T1, T2>, T3>. But this is not we want. We want flat structure like thisstd::tuple<T1, T2, T3>
Why
std::tuple_catandas_tuple? (value-level)Problem:
res1.ok() // could be T or std::tuple<...> res2.ok() // could be T or std::tuple<...>Solution:
Normalise everything into tuples. as_tuple does exact same thing. If it's
Twrap it with tuplestd::tuple<T>. If it'sstd::tuplekeep it as is. Finally concatenate every tuple using std::tuple_catChoice Combinator:
template <class A, class B> constexpr auto Choice(A&& a, B&& b) { using T = flatten_variant_t<ValueOf<A>, ValueOf<B>>; using Tok = TokenOf<A>; return parser<T, Tok>([=](auto tokens) constexpr -> Result<T>{ auto res1 = a.parse(tokens); if (res1.is_ok()) { return Result<T>(T{res1.ok()}, res1.index()); } auto res2 = b.parse(tokens); if (res2.is_ok()) { return Result<T>(T{res2.ok()}, res2.index()); } return Result<T>("Choice parser cannot parse single parser", 0); }); } template <class A, class B> constexpr auto operator|(A&& a, B&& b) { return Choice(std::forward<A>(a), std::forward<B>(b)); };💡Which result type should we use for a Choice combinator? Since it produces exactly one result out of multiple possible types, astd::tupledoesn’t fit because it represents multiple values simultaneously rather than a single selected one. What we really need is a type that can hold one of several alternatives—like a union. However, a raw union isn’t usable as a function return type in a safe or meaningful way because it doesn’t track which member is active. That leavesstd::variant, which provides the same “one-of-many” behaviour as a union but in a fully type-safe and well-defined way, making it the perfect choice here.The same reasoning applies when flattening a
std::variant: we avoid nested variants because they create unnecessary complexity.Problem ⚠️ :
Even after flattening the
std::variant, there is still an issue. What should the resulting type be for something like:CharacterParser('a') | CharacterParser('b')This would produce
std::variant<char, char>, but that’s redundant and effectively ambiguous. To avoid this, we eliminate duplicate types during construction, so the result becomes a singlestd::variant<char>instead. This deduplication step is already handled as part of the flattening process.
Conclusion
Parser combinators turn parsing into:
composition of types + functions instead of manual control flow.
We built:
type-safe sequencing
branching via variants
optional and repetition logic
full compile-time structure preservation
Full implementation
Godbolt: https://godbolt.org/z/8cEbKe9a7
Github: https://github.com/ashwith2427/Parsinator/
