Scaling AsyncAPI parsers

Jonas Lagoni Avatar

Jonas Lagoni

·6 min read

Imagine this:

You're a dedicated Python developer, entrenched in your favorite language's ecosystem, when you stumble upon AsyncAPI — a promising breakthrough for enhancing your development workflow. Eager to incorporate it into your projects, you envision building a bridge between AsyncAPI and your Python framework.

This endeavor holds the promise of unlocking new possibilities for your applications. Yet, as you delve deeper into the intricacies of AsyncAPI, you encounter an unexpected barrier: the absence of readily available parsers in your preferred language. Intrigued by the potential but hindered by a critical limitation you are left with a hard choice, continue, stop, or seek elsewhere? - ChatGPT with Jonas flavour

This post is the pitch I suggested for our team at Postman to work on for AsyncAPI. This has NOT been selected for development yet but is a suggestion (pitch) from my side. You can find the latest up-to-date version here (feel free to leave a comment!): Discussion, Pitch: Scaling AsyncAPI parsers.

Problem

These are the problems that if selected, this pitch will have solved.

TL;DR:

  • Problem 1: Inadequate parser availability restricts developer participation and tool development.
  • Problem 2: Manual maintenance processes are unsustainable as parsers are scaled.

Problem 1: Limited Language Support

If you want to create tooling that works with AsyncAPI documents, you need something that can help you interact with the provided document. This is only possible to do in TypeScript/JavaScript, as it's the only up-to-date parser. Thereby we stop whole communities of programmers from building tools around AsyncAPI.

Imagine a scenario where a Python developer, who has had no prior experience with JavaScript, discovers AsyncAPI and aims to integrate it into their Python-based development workflow. Their Python framework, a key component in their application development process, currently lacks support for AsyncAPI. So you decide to build your plugin for the framework that enables this support. Here you encounter the first wall - the absence of Python-compatible parsers. This leaves the developer with limited options, either forcing them to develop their parser from scratch on top of the plugin or explore alternative, more supported standards. This is just one of many similar stories I want us to solve, simply lower the barrier for entry for other languages to contribute to the AsyncAPI tooling ecosystem.

Problem 2: Maintenance Hurdles

The second problem is that we can hardly maintain the 1 parser we have, so having two with our current approach simply won't be possible. Especially when we have an ever-changing specification.

Solution

So instead of hand-crafting complex parsers, I suggest we create simple, minimalistic parsers that are 90% (don't take the numbers too literally) auto-generated.

90% of the code will be autogenerated 1-to-1 structures of each version of AsyncAPI, and the last 10% will support functions that handle stuff like applying traits, working with references, loading from files, etc.

The way we can achieve this is with the always up-to-date JSON Schema files (that are part of the specification release flow) to auto-generate the underlying AsyncAPI models that represent the structure of an AsyncAPI document. This means that we give developers the possibility to interact with the document. On top of that, it will always be up to date with the latest structures, as the idea is when the JSON Schema files change, the generation script is run.

Solution Guidelines

What we want to focus on is the 91% of the parser in this first iteration. The goal of this bet is to reach the following state:

  • Should be possible to interact with the entire structure of every AsyncAPI version.
  • Should be implemented across 2 or 3 languages, I recommend to choose between Python, Java, C# or TypeScript/JavaScript, based on surveys from GitHub and StackOverflow
  • Should be integrated into GitHub Action to automate the process (i.e. when the spec-json-schema is updated so are the models) as much as possible without manual intervention (of course some of the support functions, need manual intervention to handle the new versions). This is the backbone of the automation for all parsers.
  • Should expose support functions for loading AsyncAPI documents from the file.
  • Should expose support functions for loading YAML-based AsyncAPI documents.
  • Should expose support functions for loading JSON-based AsyncAPI documents.

I suggest we use Modelina to generate the 90% models for two reasons:

  1. Positive feedback loops, as any bugs we find and fix in Modelina to create the right models will benefit the entire AsyncAPI community that uses the same tool to generate payload models for their AsyncAPI document.
  2. Any feedback in the generated parsers we get about bugs with the structure, etc, directly benefits the above feedback loop as well!

Boundaries

I would completely ignore the nice-to-have features such as:

  • Do not care about parser API (there are ways we can automate this down the road)
  • Do not care about applying traits (improvement that can easily be contributed down the road)
  • Do not care about handling references (this is part of another pitch)
  • Do not care about loading from URL (improvement that can easily be contributed down the road)
  • Do not care about loading from a string (an improvement that can easily be contributed down the road)

Risks

1. Modelina Contribution Time Sink

This solution WILL make us contribute to Modelina, and Modelina will NOT directly let us generate the models from the official JSON Schema files in the way that's suitable. On the top of my head, I know we will probably have to implement union and better serialization support. And maybe even enable better naming strategy and more customized. However, it of course depends on which languages are selected. On the other hand, it's also a benefit as mentioned in the solution guideline. But it can be a time-sink, so be careful!

2. Harder To Contribute

The problem with code generation in general, is that in the parser regime, if you encounter a bug, it is very hard to figure out where that bug is and how to fix it. So with this solution, we do increase the difficulty of contributors. To negate this, having active maintainers of both the code generation tool and the parser, can easily help with figuring out what wrong and help contributors where to look.

Long-term

The long-term goal is that we can scale this solution across 10+ languages with full support functions handling all the major operations that you would do with the parser.

With this solution, we give the ParserAPI some more time to stabilize and gain popularity (or not), before we can start thinking about how we can incorporate that into this.

Lastly, I think this will lay the foundation for other standards on how to create and provide parsers for standards such as AsyncAPI.

Related resources

Photo by Lenny Kuhne on Unsplash