11 August 2025

Diving back into ML

After talking to a good friend about the company he just started in the AI/ML space, I decided to put some time into the project below. It has been a fun way to tie together things I learned (and documented in this blog) over the course of the last few years, and a bunch of the new ML tools I have been hearing about or playing with informally in the last 6 months.

Background

First a bit of context. I have been teaching myself ML for a couple of years now. In early-2023 I bought a gaming PC and a beefier GPU to set up a workbench. Right after, I had image generation set up locally via InvokeAI - a huge evolution from my first forays into the field back in early-2022. In mid-2023 the Llama models came out, and I learned about post-training supervised fine-tuning via Vicuna. I set up a browser-based app and CLI against my local vicuna-7b using FastChat. I also set up multi-model (text + vision) via MiniGPT4, so I was able to as ask questions about pictures and get reasonable responses, with everything running locally. More challenging, I spent a lot of time with Andrej Karpathy’s ‘zero to hero’ content and demo (miniGPT), trained my own model from scratch using the openweb dataset and fine-tuned it using Phish/Grateful Dead song lyrics - my goal was to create a model that could spit out reasonable poems/lyrics in that style, but I have to admit I never quite got there.

The project

After chatting with my friend, I gave myself the goal of setting up an ‘oracle’ that would use an LLM to respond to user requests. The functionality isn’t the main point here. What I really wanted was to test ML-driven development and apply my experience with AWS and GCP. Here are links for most of the tech I used:

cursor - a ML-powered editor which is supposed to be all the rage, but I had not really used before
vscode - after running out of free tokens with Cursor, I moved on to vscode and Claude
Claude - the best code generation LLM out there, from what I hear
ollama - a tool to interact with LLM models
gpt-oss - OpenAI’s open model released last week, which I wanted to use
SmolLM - a smaller model which I actually used
kubernetes - used as the underlying infrastructure, running both in GCP and AWS

You can try out the oracle via the GCP and AWS instances. Code for the infrastructure is at https://github.com/thiagorobert/ollama-wrap-infra and for the application at https://github.com/thiagorobert/ollama-wrap.

Musings

This was an interesting experience. First, I used Cursor to write the Terraform configs for the multi-cloud Kubernetes, using a simple ‘hello world’ nginx server. That went incredibly smooth.

I fired up a new Cursor environment to write the Go code. I had originally intended to run ollama in the background, and just proxy requests from the webserver to it, using its output for the responses. Cursor couldn’t quite get this right, so I had it write a unit test and iterate until the test passed. It failed: it was stuck trying to make it work for a while and I had to interrupt it.

I figured out a non-interactive interface for ollama, and had Cursor use that. At that point I had to deal with ANSI escape codes in the ollama output, and ran out of Cursor tokens. So I solved that problem by hand. I also wrote a Dockerfile by hand, and set up a public ECR with the image.

I was considering upgrading to a paid Cursor subscription, but talked to my friend again and he mentioned he uses vscode and Claude CLI directly. So I set that up, bought a Claude subscription and got started on the next phase. I’m really impressed with that set up, it’s pretty intuitive and Claude is a lot more powerful, thorough and informative than whatever Cursor was using behind the scenes.

I had Claude help me reduce the image size by using a slimmer base image, and swapped out gpt-oss with SmolLM (much smaller). I also had Claude update the kubernetes infrastructure to run my image instead of nginx.

It’s working now! I’ll leave it up for a while to celebrate, but will tear it down soon to avoid paying for this infra.

Overall, I’m really impressed with the ML-driven development flow - specially the vscode/Claude combo. I use ML at work all the time, but we still don’t have it working this smoothly integrated. It’s a lot of copy-pasting from web interfaces. I have tried the internal version of Gemini CLI (hooked up to our production systems via MCP), but honestly its all pretty janky. It’s also funny to realize how much the Gemini CLI copied from Claude CLI, even the little jokes it spits out while the model is ‘thinking’.

I’ll leave it at that for now, but this has been a great experience. I haven’t been this excited about software since mid-2023 when I was knee-deep into ML! Next up, let’s look into MCP and OAuth.

Thiago's Space Blog

Just another blog, this one about my learnings as I join the Space Industry as a software engineer.

Diving back into ML