The pipeline could research, plan, write, review, and generate images. The output was sitting in a folder called 20260408_105007_sometimes_we_try_to_explore_external_pla/. This is Part 4 — where I built the supply chain that turns a finished job into a deployable article, and learned that the "boring" transformation step is where all the pipeline's assumptions collide.


A Folder Is Not an Article

Cartoon of a robot proudly presenting a chaotic pile of files and folders to a bewildered web editor

Here's what three sessions of sophisticated multi-agent work actually produced: a project folder.

Inside it: a good article, SEO metadata, a set of generated images, a plan file. All sitting in projects/20260408_105007_sometimes_we_try_to_explore_external_pla/. That's not publishable. That's a pile of artifacts.

To put it on a website, someone had to manually read the SEO report, write frontmatter, figure out the slug, rewrite image paths, copy images to the right directory, check heading levels, and hope they didn't mistype a YAML field. Every single time.

The pipeline had automated the hard creative work and left the tedious mechanical work to humans. That's the wrong way around.


Write the Rules Before the Code

Before writing a single function, I wrote a rulebook: docs/supply-chain-rules.md. Every field mapping, every fallback, every validation check — documented as rules before any code existed.

This was the right call. Not because it felt disciplined, but because the act of writing the rules surfaced three bugs before a line of code was written:

  • The frontmatter schema had subtitle listed twice — once as required, once as optional. Which was it?
  • The article source section said "always use article_with_images.md" — but what if no images were requested? That file wouldn't exist.
  • The SEO fallback said "generate basic frontmatter from chosen_plan instead" without defining what "basic" meant for each field.

Three ambiguities that would have become bugs at runtime, caught as contradictions in a document. The spec paid for itself before the implementation started.

This is the same pattern that appeared in Part 1 — brainstorm before code — applied one level down. The 2,000-line architecture plan caught structural problems before the pipeline existed. The rules doc caught integration problems before the publisher existed. Writing things down forces precision that thinking in your head never does.


The Publisher Is Not an Agent

Every stage of the pipeline up to this point had been an agent: a system prompt, a base class, a temperature setting, an output schema. The publisher is none of those things.

SupplyChainPublisher is a plain Python class. No inheritance, no LLM calls in the main path, no probabilistic outputs. Everything it does is deterministic: read JSON, assemble YAML, rewrite paths, copy files.

1class SupplyChainPublisher:
2    async def publish(self, project_path, output_dir=None, provider=None):
3        plan = self._load_json(root / "plan" / "chosen_plan.json")
4        seo = self._normalize_seo(self._load_json(root / "seo" / "seo_report.json"))
5        article_text = self._load_article(root)
6        # ... assemble, validate, write

The architectural choice was deliberate. By the time the publisher runs, the creative work is done. What's left is mechanical transformation — and for mechanical transformation, determinism is more valuable than intelligence. An agent that occasionally produces a slightly different slug is a bug. A function that always produces the same slug from the same input is correct.

There is one exception. One task in the supply chain genuinely needs semantic reasoning: tag derivation.


A Fixed Vocabulary, Not Free-Form Tags

Cartoon of a robot at a counter studying a fixed menu board of 30 short words, circling just three, with a bin of rejected long keywords beside it

The SEO report gives you keywords like "Eye of the Sahara Richat Structure" and "unexplored ocean floor Earth." These are excellent for search engines. They are terrible as tags.

My first implementation just slugified the keywords: eye-of-the-sahara-richat-structure. I ran it, looked at the output, and immediately knew this was wrong. Tags should be short, categorical, reusable across articles — not keyword-length phrases that will never appear on a second post.

The rules doc said to "extract the core topic." Turning "pyramid construction mystery" into history is a semantic task. You can't regex your way from "Eye of the Sahara Richat Structure" to science.

So I called an LLM — but with one constraint that makes the difference between useful and chaotic:

1TAG_VOCABULARY = [
2    "science", "technology", "history", "culture", "nature",
3    "psychology", "health", "engineering", "philosophy", "economics",
4    "politics", "space", "ai", "environment", "creativity",
5    # ... 30 total
6]

The LLM can only pick from this list. The prompt is simple: here are the article's keywords and title, pick 2–5 tags from this vocabulary only.

The result: "Eye of the Sahara Richat Structure"science, nature. "Earth vs space exploration"space. Exactly the semantic compression a language model is good at — and because the output space is fixed, article three can't invent natural-sciences when articles one and two already used science.

The entire call is ~200 tokens in, ~20 tokens out. Fractions of a cent. And if the LLM is unavailable, the publisher falls back to ["general"] instead of crashing. The one LLM call in the supply chain has a deterministic fallback — which is the only reason it's allowed to be there.


When the Pipeline Disagrees with Itself

Cartoon of two robots at a table each holding a different version of the same JSON document — one flat, one nested — looking equally confused

The first real test run failed with: "tags count is 1, keywords count is 1." The publisher was reading an empty SEO report.

The bug wasn't missing data. It was a different JSON shape. The rules doc assumed a flat structure:

1{
2  "primary_keyword": {"keyword": "unsolved mysteries on Earth"},
3  "secondary_keywords": [{"keyword": "Eye of the Sahara"}]
4}

What the SEO agent actually produced:

1{
2  "seo_analysis": {
3    "keywords": {
4      "primary_keyword": "unsolved mysteries on Earth",
5      "secondary_keywords": ["Eye of the Sahara", "Lake Hillier"]
6    }
7  }
8}

Two differences: everything nested under seo_analysis.keywords, and keywords as plain strings rather than objects. I wrote the rules doc from the spec. The LLM wrote the SEO report from its own interpretation of the prompt. Neither was wrong — they just weren't the same.

This is the version of pipeline debt that shows up at the last mile. Parts 2 and 3 had bugs where one stage assumed something about how it would be called. Part 4 has bugs where two stages assumed different things about the same data. The fix was a normalizer that handles both shapes — unwraps the nesting, converts strings to objects — because the right answer to "every LLM in your pipeline has a slightly different idea of what 'return JSON' means" is a boundary layer that speaks both dialects.


The Unglamorous Last Mile

Behind the single publish command, the publisher reads the plan, normalizes the SEO report, picks the article source (article_with_images.md if images were generated, article.md if not), generates a slug, derives tags, assembles frontmatter, rewrites image paths, copies images, validates field lengths and heading levels, and warns about anything unusual — new tags, unusual article length, missing SEO data.

1agentic-writer publish abc123 --output-dir /path/to/blog-site
Published abc123
Slug: earths-unsolved-mysteries-we-still-cant-explain
MDX:  content/articles/earths-unsolved-mysteries-we-still-cant-explain.mdx
Images: public/images/articles/earths-unsolved-mysteries-we-still-cant-explain/

All validations passed.

A completed job becomes a deployable article in under five seconds.

None of this is impressive. There's no multi-model competition, no supervisor panel, no image generation. Just reading files, assembling YAML, and copying PNGs. It's the least glamorous stage in the whole pipeline.

It's also the stage that makes everything else useful. Without it, every article requires ten minutes of manual frontmatter assembly and path rewriting. The pipeline was impressive after Part 2. It became a product when it could finish the job on its own.

The detective from Sherlock Holmes — where this whole thing started, on a shelf at my parents' home in Sri Lanka — would have recognised the method: observe first, deduce second, act third, and don't overlook the mundane details. The last mile is always mundane. It's never optional.


The articles this pipeline produces are published at blueandyeliwrite.com.