Notes from refactoring my first vibecoded project in 2026

30 June 2023

In January this year I really found my way back to the bleeding edge of Agentic Engineering. My vehicle for that was building my first macOS app: markjason, a new editor for just markdown and json files.

The experience was amazing. I had tons of fun and created a new product that I use almost daily, and that has over 50 downloads in two weeks. In fact, I'm writing this in markjason. And secretly, one of the reasons I'm writing this here instead of on Substack is so I have an excuse to write in markjason. It's just so nice.

One thing that was clear to me through building the app was that it is hard to test apps. After changing the code, the app has to be build. Then it needs to be run, and it is not so easy for AI to test the UI of a native app. Not as easy as it is now with web apps. So I came out of that project excited to try my newly sharpened skills on some web development.

Luckily, I have a significant size web project in production: Magicdoor.ai. I already knew Opus 4.5 was good at web dev, because when it first came out it crushed two refactoring tasks that every other model until then had failed at. Now over the past weekend, I've unleashed my new setup with Opus 4.6 and Codex 5.3 on this product, and the results actually blow my mind.

React state management appears solved

In building Magicdoor.ai there have been a couple of state management sticking points that cost me days upon days of prompting, often ending with me finally learning how the code works, figuring out the solution myself and then hand-holding AI through the implementation. - New item button: Who knew this would be a difficult thing to do? A button to start a new chat. It turned out that reliably clearing state under all conditions was something AI just really struggled to figure out. I just refactored this button, and now it works better than ever. Faster too. - Lazy-loading chat history: Endless prompts have been spent at this. The screen would jerk around up and down as the system bounced off the scroll boundary and spastically loaded more items. Or it would not work at all... This time, with Codex, there was one initial failure after which it suggested the familiar shortcut of an explicit load more button. But after my blunt reply "why has literally every other chat interface figured out how to do this perfectly except us?" it went to work and nailed it. It now works just as good as on ChatGPT. - Loading states: In the past, often times when working on tool calls or generally on chat, AI would somehow break loading states. They were a bit janky and unreliable to begin with. In the end I spent a few hours designing a better architecture to track chat state, and it's been fine since, but I was afraid to touch it. Now, Codex cranked out a complete refactor of the animations, and extraction of the chat state machine in 15 minutes, and it worked perfectly in one shot.

These observations are not coming from an engineer who finally 'gets' agentic development, and also not from a complete beginner who is amazed 'you can just build things' now. No, these are literally things that AI failed at over, and over, and over. And now, it's fixed

Streaming Tool Calls

In a chat interface, AI responses MUST stream, meaning the answer is rendered incrementally as it is generated. This is non negotiable. AI struggled with this. A lot of the time, certain JSON markers which indicate for example that "what's coming next is a tool call result", need to be handled in real time. After failing to do this in a loop, AI agents up to September 2025 would invariably just break streaming. "I've changed the system to buffer the response until we have fully formed JSON", they would say (funtionally this just awaits the entire response before rendering it, thus breaking streaming).

But in the past few days I've let Codex and Opus 4.6 refactor pretty much the entire core data flow of the AI calls. The API routes, the hooks, the messages.tsx component, the chatwindow.tsx component. It never ONCE broke streaming, or file uploads, or image rendering. It all just kept working.

What happened?

I think there are three elements here:

When I started on Magicdoor.ai there weren't that many chat interfaces yet. Training data was missing for how to solve common problems in AI Chat interfaces. Now there's plenty of it.
It could be that refactoring without changing functionality is ok, and that it will still break when adding truly new functionality. I will sound find out.
The models got way better. Everyone is right. There are good reasons for the short-cycle hype to be peaking in Feb 2026. It is GOOOOOD.

I keep texting my vibecoder friends: "This is what vibecoding is meant to be like. It actually works!"