Using Codex with Hardware In The Loop for Microcontrollers

24 Mar 2026 - tsp
Last update 24 Mar 2026
Reading time 11 mins

When people talk about “AI pair programmers” they usually picture yet another autocomplete window. In this blog article I treated Codex as an embedded teammate with full access to a ModBus target bench: a single ATmega2560 fitted with a MAX485 transceiver flashable via its main serial port and the Arduino bootloader, accessed on it’s secondary UART through a USB serial-to-RS485 adapter Because Codex can compile, flash, and interrogate that modest setup as part of its inner loop, we ended up with a development cadence that felt like test-driven hardware bring-up instead of the usual edit-build-burn cycle.

This article just describes very brieflyvery briefly a flow that was really used this way, yielding a workable library, it is neither sugar coated nor exaggregated. Keep in mind this was a small scale project so it was a very easy task for the agent to perform. Note that the conversations with the agent are trimmed down, the shown snippets should just provide a rough idea.

Bootstrapping the collaboration and designing the architecture

We started by writing AGENTS.md as an executable contract. It spells out that Codex must keep Timer0 free unless an optional sysclock is enabled, ship UART ISRs with ring buffers, and update every artifact (AGENTS, DESIGN_DOCUMENT, TODO, user docs) whenever reality changes. That file also pins the toolchain (avr-gcc/avr-libc/binutils/avrdude/GNU make via gmake), the target MCUs (ATmega328P utilizing UART0 and ATmega2560 utilizing UART0 and UART1 even though only the 2560 rig was in the loop for now), and communication habits (line-referenced file mentions, short status updates, immediate blocker escalation). Having that behavior encoded up front was the equivalent of onboarding a senior engineer in writing: every later decision referenced back to it, and Codex kept it fresh whenever constraints shifted.

“You are a design architect and software developer. We are going to implement a ModBus Slave on an ATMega microcontroller. First we are creating a design document till all open questions are resolved and we have specified all technical details. We do this in a back and forth conversation, you do not take decisions. Present open questions and provide suggestions. The user decides which decissions to take. You do not decide yourself. Follow the design rules from DESIGNRULES.md when writing the architecture document. In the first stage we are writing docs/DESIGN_DOCUMENT.md as a detailed technical design document and docs/TODO.md as an [ ] open, [x] done, [-] rejected ToDo list that you keep up to date all the time. After we finished designing you are going to implement the project according to the ToDo list using avr-gcc, avr-libc, binutils and avrdude. You build using gmake. Use no other tools. You can flash the program to the microcontroller using gmake flash and access the serial port via /dev/ttyU0 as well as the RS485 bus on /dev/ttyU1. First draft your AGENTS.md that explains your role. Only ever edit ~/projectdirectory.”

With the agent contract in place we drafted docs/DESIGN_DOCUMENT.md. Codex drove that conversation like an architecture review. It enumerated which registers must be memory-backed, how MAX485 control pins are abstracted, what the ISR boundary looks like, and even future knobs (Timer0 vs Timer1 tick sources). Whenever ambiguity popped up - How should holding register 255 trigger EEPROM commits? Should UUIDs live in callbacks or static blocks? - Codex paused, listed options, and asked for confirmation before touching code. That high-velocity Q&A mirrored how a human architect would unblock a team, just without the context loss that happens when humans juggle too many requirements.

“Start writing the ModBus RTU slave architecture for ATmega2560 + MAX485 attached to UART1. Document UART buffers, gap timing and the reset behavior before you write firmware. Also honor proper timeout processing. We support input registers, coil outputs, holding registers and output registers. We need write single, write multiple, read single and read multiple commands for those registers. We implement a set of fixed registers for the device address (…), baud rate (…) as well as an UUID based identity register to identity the device and the firmware. Later we are also going to implement an RS485 capable bootloader so we are able to flash the device via the RS485 bus (this will be an independent project)”

To keep implementation straight we used docs/TODO.md as both a kanban lane and a verification ledger. Items ranged from API scaffolding to tests with the single board RS485 link. The checklist style (explicit [ ] vs [x]) made it trivial to see which capabilities still needed either implementation or bench validation. Parallel to that we maintained KB/index.md, a placeholder knowledge base meant for any external ModBus or AVR timing references Codex might have had to fetch. Even when the KB stayed empty, the scaffolding reminded us that Codex could, on demand, go out to public documentation, store PDFs or markdown summaries under KB/, and cite them later (this was left out in the presented instructions above).

This is a pretty standard approach to use an coding agent. One will spend usually between an hour and half a day on writing a design document this way, depending on the project scope, talking back and forth with the agent to resolve open questions, discuss feasability as well as pros and cons and take decisions. This phase feels like the meetings with human engineers during the design and architecture phase, though being way more productive and experiencing less friction and social stress. And in contrast to the human world dumb ideas always get harsh feedback.

Iterating with real silicon in the loop

Once the architecture felt solid Codex shifted into coding mode. It produced the UART/RS-485 hardware layer, the ModBus core parser, and the register handlers in digestible steps, performing unit tests as it went. It always followed the same pattern: update the TODO, write code, run gmake, run unit tests, flash the ATmega2560 and immediately exercise register read and writes over the physical bus via on-the-fly written pyserial scripts. Because the MAX485 driver enable lines and UART ISRs were part of the same repo, Codex could inject temporary instrumentation (extra GPIO toggles, debug prints gated behind #ifdef MODBUS_DEBUG, CRC probes, etc.) without breaking the contract, test the hypothesis on hardware, and then strip the probes again - all inside a single loop. This fitted the printf style debugging that junior engineers often use.

“Perform a sequence of reads and writes into and from the registers and dump debug messages on the AVRs serial port. Interact with the device via the RS485 bus and inspect validity of the reaction on the serial port. Create valid and invalid requests.”

A concrete example came up when we noticed that writing a new device ID took effect immediately, which violates ModBus expectations. Codex traced the bug by flashing diagnostic builds that logged both the pending and active IDs after every frame. It then restructured modbus_core.c to separate PendingConfig from ActiveDeviceId, staged new values in RAM only, and confirmed on the rig that the slave still answered on the old ID until the reset magic 0xAA55 forced a reboot. That entire investigation - code change, compilation, flashing, scripted ModBus transactions, and regression verification - ran autonomously while we observed the terminal output.

“I see a device ID bug. The device seems to automatically apply directly after writing into the respective register. Reproduce the bug, capture pending vs active IDs over UART, and refactor to fix so we only apply the change after rebooting the device via reboot magic.”

Because hardware was always in the loop Codex could also stress scenarios that usually wait for the lab: half-duplex turnaround timing, bursts, deliberate line silence to test the 1.5/3.5 character gap watchdog, and watchdog-induced resets. Whenever a test exposed a weakness, Codex modified the source (sometimes inserting extra assertions or statistics counters), rebuilt, and reran the scenario minutes later. There was no “hand code to a human to flash” delay, so iteration speed approached software-only TDD despite touching real silicon.

Debug automation without babysitting

The workflow never depended on someone manually driving a serial console. Instead we kept lightweight Python and shell utilities around to spray ModBus frames, capture responses, and reset boards via the watchdog harness. Codex could call those scripts, parse their output, and decide on the next change without waiting for human prompts. That made higher-level experiments feasible: for example, sweeping coil write bursts across dozens of registers while monitoring current draw, or verifying that ring buffer overruns stay cleared even when the main loop intentionally starves ModBusService() for a few milliseconds.

This autonomy extended to documentation and guardrails. Any time the behavior changed, Codex updated AGENTS, the design doc, TODO, and the example app notes. It also would have been trivial to pull protocol specs from the public internet, normalize them into KB/*.md, and cite them inline - handy when juggling RTU timing or EEPROM endurance data. The same mechanism can ingest errata sheets, ModBus application notes or even oscilloscope captures dumped via the pylabdevs devices, giving future sessions instant context.

Future-proofing with formal methods

One of the underrated perks of this setup is how easily it can grow into formal assurance. The same Codex agent that compiles and flashes code can also emit ACSL annotations for critical routines. Feed those annotations plus the source into Frama-C, and you gain static guarantees (no runtime errors, preserved invariants) before the bits ever hit flash. Coupling Frama-C proofs with hardware-in-the-loop regression runs lets you blend mathematical confidence with empirical validation, a combination that is usually out of reach for small embedded teams.

What are the benefits of hardware in the loop

Putting Codex inside the hardware loop changed the economics of firmware work. Instead of queuing questions for a future lab slot, we answered them immediately with the actual boards. Instead of hoping a human remembered every constraint, we encoded expectations in AGENTS and the design doc so the assistant could enforce them relentlessly. Instead of deferring documentation, we kept the narrative up to date as part of every change. Most importantly, the assistant never tired: it could keep iterating—tweaking ISR latency, adjusting ModBus timing, mutating register maps, or running soak tests—long after a human would have walked away.

If you are building microcontroller firmware with tight loops, shared peripherals or embedded networks, wiring Codex into your hardware bench gives you the confidence of continuous validation with the speed of scripted development. Whether you need a minimally guided debugging partner or a fully autonomous regression runner, the same ingredients apply: define the agent contract, capture the architecture, keep TODO and KB artifacts honest, and hand the assistant access to your toolchain plus your boards. The result is a development flow that feels both methodical and fast—exactly what embedded projects need.

References

This article is tagged:


Data protection policy

Dipl.-Ing. Thomas Spielauer, Wien (webcomplainsQu98equt9ewh@tspi.at)

This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/

Valid HTML 4.01 Strict Powered by FreeBSD IPv6 support