Skip to main content

Command Palette

Search for a command to run...

Building a type-safe Parser Combinator Library in Modern C++

Updated
7 min read
A
I write about C++, performance, and building things from first principles. Currently exploring data structures, allocators, and how things work under the hood.

What are Parser Combinators??

Instead of building one large monolithic parser like a recursive descent parser, parser combinators let you construct complex parsers by composing small, reusable functions.

Each parser:

  • consumes input

  • either succeeds with a value + remaining input

  • or fails with an error

The Core Idea


We write tiny parsers:

Parser A → parses 'a'
Parser B → parses 'b'

At this stage, nothing is composed yet.
Then we combine them:

           Parser A                              Parser B
          (parse 'a')                           (parse 'b')

             │                                      │

             └───────────┬────────────┘
                                ↓

                      Sequence Combinator(>>)
                                ↓

                        Combined Parser (P)

Execution Model


Let’s run the composed parser:

Input:

"ab"

Flow:

Combined Parser P
        ↓
   Parser A → consumes 'a'
        ↓
   Parser B → consumes 'b'

Output:

['a', 'b']

Composing Parsers with Combinators


Now that we understand how parsers are built and executed, the next question is:
How do we actually combine small parsers to build complex ones?

This is where parser combinators come in

Types of Combinators


Sequence Combinator (>>):

It runs two parsers one after another.

  • First parser consumes part of the input

  • Second parser continues from where the first stopped

  • Final result combines both outputs

A >> B

Choice Combinator (|):

Choice combinator tries parsers one by one until one succeeds.

  • Order matters

  • First successful parser wins

Input: "b"

A = parse 'a' → fails  

B = parse 'b' → succeeds

Optional Combinator:

Represents something that may or may not exist.

  • success → returns value

  • failure → empty result

Input: "abc"

Optional(parse 'z') → None

Repetition Combinator:

Matches a parser repeatedly until it stops making progress.

Stops when:

  • parser fails

  • or no progress is made

Input: "bbba"

     ↓

Parser 'b' runs repeatedly

     ↓

Result: ['b', 'b', 'b']

Discard/Skip combinator:

Consumes input but ignores output.

Used for:

  • spaces

  • separators

  • punctuation

The core abstraction: Parser


A Parser is simply a wrapper around a function that:

  • takes a sequence of input tokens

  • processes some portion of that input

  • returns a Result, which contains either:

    • a parsed value along with how much input was consumed, or

    • an error indicating why parsing failed

template <class T, class Tok, class Fn>
class Parser {
public:
    using result_type = T;
    using token_type = Tok;
    constexpr Parser(Fn&& fn): _apply(std::forward<Fn>(fn)) {}
    constexpr Result<T> parse(std::span<Tok const> tokens) const {
        return _apply(tokens);
    }
private:
    Fn _apply;
};

Why do we need a Result type??

Parsing always has two possible outcomes:

  • success

  • failure

Why not just return T?

If we only return a value:

char parse(...);

There is no way to represent failure.

Why not std::optional<T> ?

This allows failure, but still misses two critical things:

  • No error information

  • No input consumption tracking

Result Type


We use tagged Result type:

Result<T> =
    Success(T value, index)
    OR
    Failure(ParseError error, index)
class ParseError {
public:
    constexpr ParseError(const char* error): _msg(error) {}
    constexpr const char* error() { return _msg; }
private:
    const char* _msg;
};

template <class T>
class Result {
public:
    using value_type = T;

    constexpr Result(T v, uint32_t idx)
        : _ok(true), _value(v), _idx(idx) {}

    constexpr Result(ParseError e, uint32_t idx)
        : _ok(false), _error(e), _idx(idx) {}

    constexpr bool is_ok() const { return _ok; }
    constexpr bool is_error() const { return !_ok; }

    constexpr T const& ok() const { return _value; }
    constexpr ParseError const& error() const { return _error; }

    constexpr uint32_t index() const { return _idx; }

private:
    bool _ok;
    T _value{};
    ParseError _error{""};
    uint32_t _idx;
};

Implementing Combinators


Now that we have:

  • a Parser abstraction

  • a Result type

we can start building combinators, functions that take parsers and return new parsers.

  1. Sequence Combinator (>>):

    template <class A, class B>
    constexpr auto Sequence(A&& a, B&& b) {
        using T1 = ValueOf<A>;
        using T2 = ValueOf<B>;
        using Tok = TokenOf<A>;
        using T = flatten_tuple_t<std::tuple<T1, T2>>;
        return parser<T, Tok>([=](std::span<Tok const> tokens) constexpr -> Result <T> {
            auto res1 = a.parse(tokens);
            if (res1.is_error()) {
                return {res1.error(), res1.index()};
            }
            auto res2 = b.parse(tokens.subspan(res1.index()));
            if (res2.is_error()) {
                return {res2.error(), res1.index() + res2.index()};
            }
            return Result<T>(
                std::tuple_cat(
                    as_tuple(res1.ok()),
                    as_tuple(res2.ok())
                ),
                res1.index() + res2.index()
            );
        });
    }
    
    template <class A, class B>
    constexpr auto operator>>(A&& a, B&& b) {
        return Sequence(std::forward<A>(a), std::forward<B>(b));
    };
    
    💡
    A sequence combinator must return a result whose type is known at compile time. Using something like std::vector loses type information and isn’t fully compile-time safe. Instead, we use std::tuple to preserve exact types.

    Why do we flatten tuples?? (type-level)

    The sequence combinator can be applied multiple times:

    p1>>p2>>p3>>p4

    1. p1 >> p2 produces std::tuple<T1, T2>

    2. Then we apply p3:
      (p1>>p2)>>p3. If we combine the resultant types, the type becomes:
      std::tuple<std::tuple<T1, T2>, T3>. But this is not we want. We want flat structure like this
      std::tuple<T1, T2, T3>

    Why std::tuple_cat and as_tuple? (value-level)

    Problem:

    res1.ok()  // could be T or std::tuple<...>
    res2.ok()  // could be T or std::tuple<...>
    

    Solution:

    Normalise everything into tuples. as_tuple does exact same thing. If it's T wrap it with tuple std::tuple<T>. If it's std::tuple keep it as is. Finally concatenate every tuple using std::tuple_cat

  2. Choice Combinator:

    template <class A, class B>
    constexpr auto Choice(A&& a, B&& b) {
        using T = flatten_variant_t<ValueOf<A>, ValueOf<B>>;
        using Tok = TokenOf<A>;
        return parser<T, Tok>([=](auto tokens) constexpr -> Result<T>{
            auto res1 = a.parse(tokens);
            if (res1.is_ok()) {
                return Result<T>(T{res1.ok()}, res1.index());
            }
            auto res2 = b.parse(tokens);
            if (res2.is_ok()) {
                return Result<T>(T{res2.ok()}, res2.index());
            }
            return Result<T>("Choice parser cannot parse single parser", 0);
        });
    }
    
    template <class A, class B>
    constexpr auto operator|(A&& a, B&& b) {
        return Choice(std::forward<A>(a), std::forward<B>(b));
    };
    
    💡
    Which result type should we use for a Choice combinator? Since it produces exactly one result out of multiple possible types, a std::tuple doesn’t fit because it represents multiple values simultaneously rather than a single selected one. What we really need is a type that can hold one of several alternatives—like a union. However, a raw union isn’t usable as a function return type in a safe or meaningful way because it doesn’t track which member is active. That leaves std::variant, which provides the same “one-of-many” behaviour as a union but in a fully type-safe and well-defined way, making it the perfect choice here.

    The same reasoning applies when flattening a std::variant: we avoid nested variants because they create unnecessary complexity.

    Problem ⚠️ :

    Even after flattening the std::variant, there is still an issue. What should the resulting type be for something like:

    CharacterParser('a') | CharacterParser('b')

    This would produce std::variant<char, char>, but that’s redundant and effectively ambiguous. To avoid this, we eliminate duplicate types during construction, so the result becomes a single std::variant<char> instead. This deduplication step is already handled as part of the flattening process.

Conclusion


Parser combinators turn parsing into:

composition of types + functions instead of manual control flow.

We built:

  • type-safe sequencing

  • branching via variants

  • optional and repetition logic

  • full compile-time structure preservation

Full implementation


Godbolt: https://godbolt.org/z/8cEbKe9a7
Github: https://github.com/ashwith2427/Parsinator/

97 views