Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Matcheroni, a tiny C++20 header library for building lexers/parsers (github.com/aappleby)
96 points by aappleby on July 7, 2023 | hide | past | favorite | 12 comments
Howdy HN, as part of my ongoing programming language experiments I've ended up creating my own C++20 lexing and parsing library of sorts.

Matcheroni is an alternative to parser generators and regular expressions that uses trees of C++ templates to implement highly customizable lexers and parsers that have minimal impact on build times or binary sizes, while still remaining comparable in performance to Boost regular expressions.

The repo includes two example projects - a simple regex parser in ~300 heavily documented lines of code, and a much larger but not quite finished C99 lexer and parser.

All feedback appreciated!



Very cool, and I like the name!

I'd be interested in reading about how Matcheroni compares with PEGTL and Lexy.

https://github.com/taocpp/PEGTL https://lexy.foonathan.net/


You may want to compare this against boost/spirit. I use Spirit extensively for my own Domain Specific Language parsing requirements.


I love the Spirit API, but Boost is huge, so I just hand coded a parser for myself at the end. C++20 header only library is a great thing to have.


Perhaps Boost.Metaparse[0] would be of interest:

  The library is similar to Boost.Spirit, however while
  parsers built with Spirit parse at run-time, parsers
  built with Metaparse parse at compile-time.
0 - https://www.boost.org/doc/libs/1_82_0/doc/html/metaparse.htm...


This is exactly what I’ve been wishing for for ages!

The main downside is requiring C++20 since that does make it a bit less portable to compilers which haven’t caught up to the standard — each new standard revision adds a great deal of complexity which even the Big Three compilers tend to take years to finish implementing, so the smaller fish really do get left out of the picture. As I understand it, those are typically semi-compliant implementations of GCC on embedded devices, or on historical hardware, where a library with zero stdlib dependencies would be super useful, but can’t really be used here due to the modern language standard.

So, necessity demands you rewrite this as a C89 macro library.


No need to go back to stone age, C++14 and C++17 are supported well enough.


Wow this is interesting.

Thank you for this.

I started writing a C lexer and soon a parser in Python to make diagrams out of the postgres source codebase. I didn't use Pyparsing but am doing it by hand.

I only really understand recursive descent parsing!


I appreciate having a self contained header only library for this type of thing. It seems like a better fit for my needs than heavier alternatives like Boost Spirit.


Boost is like 95% header only anyway, and Spirit X3 does more stuff at compile time than Spirit v2


The main downside is the size of Boost and the tight connection between its libraries. It bloats code size and slows down compilation significantly.

Even using Boost BCP to follow dependencies, it still ends up including half the library.


An existing project called "matcharoni" (spelling difference noted):

"A pattern matching heavy language designed with Advent of Code. langjam0002 runner-up"

https://github.com/sumeet/matcharoni


I have been using lexy to parse a pythonic language I made, it was hard but I managed to do it (although I handle indentation myself).

Matcheroni seems to lack a proper tutorial, lexy's tutorial is pretty good.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: