Building a type-safe Parser Combinator Library in Modern C++

What are Parser Combinators??

Instead of building one large monolithic parser like a recursive descent parser, parser combinators let you construct complex parsers by composing small, reusable functions.

Each parser:

consumes input
either succeeds with a value + remaining input
or fails with an error

The Core Idea

We write tiny parsers:

Parser A → parses 'a'
Parser B → parses 'b'

At this stage, nothing is composed yet.
Then we combine them:

           Parser A                              Parser B
          (parse 'a')                           (parse 'b')

             │                                      │

             └───────────┬────────────┘
                                ↓

                      Sequence Combinator(>>)
                                ↓

                        Combined Parser (P)

Execution Model

Let’s run the composed parser:

Input:

"ab"

Flow:

Combined Parser P
        ↓
   Parser A → consumes 'a'
        ↓
   Parser B → consumes 'b'

Output:

['a', 'b']

Composing Parsers with Combinators

Now that we understand how parsers are built and executed, the next question is:
How do we actually combine small parsers to build complex ones?

This is where parser combinators come in

Types of Combinators

Sequence Combinator (>>):

It runs two parsers one after another.

First parser consumes part of the input
Second parser continues from where the first stopped
Final result combines both outputs

A >> B

Choice Combinator (|):

Choice combinator tries parsers one by one until one succeeds.

Order matters
First successful parser wins

Input: "b"

A = parse 'a' → fails  

B = parse 'b' → succeeds

Optional Combinator:

Represents something that may or may not exist.

success → returns value
failure → empty result

Input: "abc"

Optional(parse 'z') → None

Repetition Combinator:

Matches a parser repeatedly until it stops making progress.

Stops when:

parser fails
or no progress is made

Input: "bbba"

     ↓

Parser 'b' runs repeatedly

     ↓

Result: ['b', 'b', 'b']

Discard/Skip combinator:

Consumes input but ignores output.

Used for:

spaces
separators
punctuation

The core abstraction: Parser

A Parser is simply a wrapper around a function that:

takes a sequence of input tokens
processes some portion of that input
returns a Result, which contains either:
- a parsed value along with how much input was consumed, or
- an error indicating why parsing failed

template <class T, class Tok, class Fn>
class Parser {
public:
    using result_type = T;
    using token_type = Tok;
    constexpr Parser(Fn&& fn): _apply(std::forward<Fn>(fn)) {}
    constexpr Result<T> parse(std::span<Tok const> tokens) const {
        return _apply(tokens);
    }
private:
    Fn _apply;
};

Why do we need a Result type??

Parsing always has two possible outcomes:

success
failure

Why not just return T?

If we only return a value:

char parse(...);

There is no way to represent failure.

Why not std::optional<T> ?

This allows failure, but still misses two critical things:

No error information
No input consumption tracking

Result Type

We use tagged Result type:

Result<T> =
    Success(T value, index)
    OR
    Failure(ParseError error, index)

class ParseError {
public:
    constexpr ParseError(const char* error): _msg(error) {}
    constexpr const char* error() { return _msg; }
private:
    const char* _msg;
};

template <class T>
class Result {
public:
    using value_type = T;

    constexpr Result(T v, uint32_t idx)
        : _ok(true), _value(v), _idx(idx) {}

    constexpr Result(ParseError e, uint32_t idx)
        : _ok(false), _error(e), _idx(idx) {}

    constexpr bool is_ok() const { return _ok; }
    constexpr bool is_error() const { return !_ok; }

    constexpr T const& ok() const { return _value; }
    constexpr ParseError const& error() const { return _error; }

    constexpr uint32_t index() const { return _idx; }

private:
    bool _ok;
    T _value{};
    ParseError _error{""};
    uint32_t _idx;
};

Implementing Combinators

Now that we have:

a Parser abstraction
a Result type

we can start building combinators, functions that take parsers and return new parsers.

Sequence Combinator (>>):

template <class A, class B>
constexpr auto Sequence(A&& a, B&& b) {
    using T1 = ValueOf<A>;
    using T2 = ValueOf<B>;
    using Tok = TokenOf<A>;
    using T = flatten_tuple_t<std::tuple<T1, T2>>;
    return parser<T, Tok>([=](std::span<Tok const> tokens) constexpr -> Result <T> {
        auto res1 = a.parse(tokens);
        if (res1.is_error()) {
            return {res1.error(), res1.index()};
        }
        auto res2 = b.parse(tokens.subspan(res1.index()));
        if (res2.is_error()) {
            return {res2.error(), res1.index() + res2.index()};
        }
        return Result<T>(
            std::tuple_cat(
                as_tuple(res1.ok()),
                as_tuple(res2.ok())
            ),
            res1.index() + res2.index()
        );
    });
}

template <class A, class B>
constexpr auto operator>>(A&& a, B&& b) {
    return Sequence(std::forward<A>(a), std::forward<B>(b));
};

💡

A sequence combinator must return a result whose type is known at compile time. Using something like std::vector loses type information and isn’t fully compile-time safe. Instead, we use std::tuple to preserve exact types.

Why do we flatten tuples?? (type-level)

The sequence combinator can be applied multiple times:

p1>>p2>>p3>>p4

p1 >> p2 produces std::tuple<T1, T2>

Then we apply p3:
(p1>>p2)>>p3. If we combine the resultant types, the type becomes:
std::tuple<std::tuple<T1, T2>, T3>. But this is not we want. We want flat structure like this
std::tuple<T1, T2, T3>

Why std::tuple_cat and as_tuple? (value-level)

Problem:
res1.ok()  // could be T or std::tuple<...>
res2.ok()  // could be T or std::tuple<...>
Solution:

Normalise everything into tuples. as_tuple does exact same thing. If it's T wrap it with tuple std::tuple<T>. If it's std::tuple keep it as is. Finally concatenate every tuple using std::tuple_cat

Choice Combinator:
```
template <class A, class B>
constexpr auto Choice(A&& a, B&& b) {
    using T = flatten_variant_t<ValueOf<A>, ValueOf<B>>;
    using Tok = TokenOf<A>;
    return parser<T, Tok>([=](auto tokens) constexpr -> Result<T>{
        auto res1 = a.parse(tokens);
        if (res1.is_ok()) {
            return Result<T>(T{res1.ok()}, res1.index());
        }
        auto res2 = b.parse(tokens);
        if (res2.is_ok()) {
            return Result<T>(T{res2.ok()}, res2.index());
        }
        return Result<T>("Choice parser cannot parse single parser", 0);
    });
}

template <class A, class B>
constexpr auto operator|(A&& a, B&& b) {
    return Choice(std::forward<A>(a), std::forward<B>(b));
};
```
💡

Which result type should we use for a Choice combinator? Since it produces exactly one result out of multiple possible types, a std::tuple doesn’t fit because it represents multiple values simultaneously rather than a single selected one. What we really need is a type that can hold one of several alternatives—like a union. However, a raw union isn’t usable as a function return type in a safe or meaningful way because it doesn’t track which member is active. That leaves std::variant, which provides the same “one-of-many” behaviour as a union but in a fully type-safe and well-defined way, making it the perfect choice here.

The same reasoning applies when flattening a std::variant: we avoid nested variants because they create unnecessary complexity.

Problem ⚠️ :

Even after flattening the std::variant, there is still an issue. What should the resulting type be for something like:

CharacterParser('a') | CharacterParser('b')

This would produce std::variant<char, char>, but that’s redundant and effectively ambiguous. To avoid this, we eliminate duplicate types during construction, so the result becomes a single std::variant<char> instead. This deduplication step is already handled as part of the flattening process.

Conclusion

Parser combinators turn parsing into:

composition of types + functions instead of manual control flow.

We built:

type-safe sequencing
branching via variants
optional and repetition logic
full compile-time structure preservation

Full implementation

Godbolt: https://godbolt.org/z/8cEbKe9a7
Github: https://github.com/ashwith2427/Parsinator/

Building a type-safe Parser Combinator Library in Modern C++

The Core Idea

Execution Model

Composing Parsers with Combinators

Types of Combinators

The core abstraction: Parser

Result Type

Problem:

Solution:

Conclusion

Full implementation

Comments

More from this blog

The hidden detail behind std::addressof in C++

Command Palette

The Core Idea

Execution Model

Composing Parsers with Combinators

Types of Combinators

The core abstraction: Parser

Result Type

Problem:

Solution:

Conclusion

Full implementation

Comments

More from this blog