AI 不會取代工程師,但會取代工程師的養成方式


最近一直看到公司在做同一件事:砍 junior 缺。


帳算起來其實滿合理的。一個 senior 配 AI 抵以前兩三個 junior,產出多、品質穩、又省下帶人的時間。我懂這個算盤怎麼打,只是覺得很少人有真的去想再往後幾年會怎樣。


過去 junior 工程師的學習,其實是「順便發生」的。你寫的 code 不完美,但公司會拿去用,所以願意付你薪水讓你一邊做一邊學。學習是工作的副產品,不用另外買單。


AI 改變的就是這件事。當 AI 比 junior 更快、更穩,原本附在工作上的學習就沒有商業價值了。學習從副產品變成純成本——而沒有公司會願意付錢培養純成本。


所以現在的狀況有點怪:線上還在撐著的 senior,都是靠過去十幾年實戰累積出來的判斷力。他們知道怎麼盯 AI、怎麼抓 AI 出包、AI 卡住的時候自己頂上。問題是這群人有保存期限,會退、會轉行、會 burnout。


下一代的 senior 要從哪裡長出來?我沒看到太多人在認真回答這個問題。


我自己的看法是——如果你相信 AI 在未來十年、二十年內都還是需要人類監督的話——那監督者的能力就不能離 AI 太遠。你要真的懂,才有辦法判斷 AI 做的是對是錯。沒自己手寫過 state management、沒在半夜兩點被 race condition 搞到崩潰過的人,就算 AI 給他一份完美的 code,看是看得懂,但系統真的出包時他不知道從哪裡下手。


那種直覺,是親手把東西搞壞過才會長出來的。我目前沒看過例外。


所以我猜——這個猜可能有點賭——軟體團隊會慢慢演化出一種 mentor 制。不是回到以前大量招 junior 做雜事的模式,而是少量、精挑、被刻意設計過的學徒制。新人會有一段時期被故意限制成「不准用 AI」,必須自己寫、自己 debug、自己踩坑,等 mental model 蓋起來之後,才正式進編制。


感覺有點像醫學系。明明 CT 比觸診聽診清楚太多,住院醫師還是得先把觸診聽診學會。那些看起來「沒效率」的訓練不是浪費時間,而是在養臨床直覺。沒那層底子,連 CT 報告都讀不準。


你可能會說:軟體業又沒有醫師執照這種制度門檻,誰來逼公司這樣做?A 公司花三年紮實帶人,B 公司讓新人第一天就靠 AI 衝產出,短期內市場一定獎勵 B 啊。


我自己也卡在這個問題卡了一陣子,後來不知道為什麼想到水電工。


水電這行沒什麼國家級認證制度,但師徒制延續了幾百年都沒斷。原因不複雜:後果藏不住。水管接不好會漏,電線拉不對會燒。你不會去翻師傅的證照,你家有沒有被淹過、被燒過,就是最直接的篩選。市場學會分辨好壞,不是因為有制度,是因為痛過。


軟體業以前的痛來得慢——一個爛架構可以撐個兩三年才整個崩掉。但 AI 應該會把這個時間軸壓得很短。當所有人都用 AI 快速堆東西,系統的複雜度會膨脹得比以前快得多,崩起來也會更早、更猛。到時候「這個人到底懂不懂」是藏不住的。


所以也許根本不需要什麼外部制度。AI 自己就會製造出夠多的災難,逼市場慢慢學會區分「會用工具的人」跟「懂手藝的人」。沒有法律規定你裝水電一定要找老師傅,但你家被淹過一次之後,下次自然會問清楚。


我覺得我們現在就站在「被淹之前」這個階段。大家看著 AI 的生產力很興奮,砍人也砍得很果斷。等到人才斷層的痛真的浮上來——可能要五年,也可能更久——培養的方式才會被重新拿出來認真想。而那個新的方式,大概會長得很像 mentor 制:不是自然發生的學習,而是被刻意設計過的訓練環境。


這樣下來,軟體工程師的職涯曲線會變得有點像傳統手藝行業:前幾年薪水不高、學得密集、淘汰率也高。但出師之後,你會是少數真的能駕馭 AI 的人。


跟過去那種「bootcamp 三個月就能上班」的世界,方向是完全反過來的。


"Software Fundamentals Matter More Than Ever" — Matt Pocock

At AI Engineer Europe, Matt Pocock — founder of Total TypeScript and the educator behind the AI Hero platform — gave a short talk titled "It Ain't Broke: Why Software Fundamentals Matter More Than Ever." For engineers worried that their skill set has been quietly rendered obsolete by AI, his message is unexpectedly comforting: software fundamentals are not in retreat. They matter now more than they ever have.

Matt is not a new face in the developer world. He was previously a Developer Advocate at Vercel, a core team member on XState, and the creator of Total TypeScript — a series widely treated as the industry standard for TypeScript education. Over the past year he has shifted his center of gravity from TypeScript to AI engineering, building out his own platform AI Hero, whose community now counts more than 54,000 developers. His YouTube channel (@mattpocockuk, 175K subscribers) carries a motto that reads almost as a manifesto: "We don't do vibe coding — this is a channel for real engineers." That trajectory — from TypeScript educator to AI engineering educator — places him at the unusual intersection of old-school software craftsmanship and bleeding-edge AI tooling, and it's a large part of why this talk lands with the weight it does.

His thesis is direct: good codebases are more valuable than ever, because AI performs remarkably well in them and remarkably badly in bad ones.

Why Specs-to-Code Doesn't Hold Up

Matt opens by addressing the recently popular "specs-to-code" movement — the idea that you write a specification, let AI compile it into code, and when something breaks you go back and edit the spec rather than the code itself. He admits he tried it. The first run produced mediocre code. The second produced worse code. The third was worse still. Eventually he was left with garbage.

He found a name for this phenomenon in The Pragmatic Programmer: software entropy. Every change made without thinking about the design of the whole system pushes the codebase further toward collapse. The premise behind specs-to-code is that "code is cheap," but Matt pushes back hard on this. Bad code, he argues, has never been more expensive — because a codebase that's hard to change locks you out of every benefit AI could offer.

Failure Mode 1: The AI Built the Wrong Thing → Use "Grill Me" to Build Shared Understanding

Matt draws on Frederick P. Brooks' The Design of Design and its concept of the design concept — when two parties design something together, the thing being designed floats invisibly between them. It's a theory, not an asset; nothing you can put in a markdown file. The core problem between you and the AI, he argues, is that you don't yet share one.

To fix this he wrote a skill called "grill me." Its content is dead simple: interview me relentlessly about every aspect of this plan until we reach a shared understanding; walk down each branch of the design tree, resolving dependencies between decisions one by one. The repository hosting this skill has accumulated tens of thousands of GitHub stars. The few short lines turn the AI into an adversary that fires forty, sixty, sometimes a hundred questions at you before it's satisfied. Matt prefers this flow over the default plan mode in Claude Code, which he finds too eager to generate an asset rather than first reach a shared design concept.

Failure Mode 2: The AI Is Too Verbose → Build a Ubiquitous Language

The second failure mode is the AI talking past you, using too many words to say what it's doing. Matt compares this to the language gap between veteran developers and domain experts — if a domain expert wants you to build something around microchips and you have no shared vocabulary, the translation into code breaks down at every step.

He turns to Domain-Driven Design for the answer: the concept of a ubiquitous language. Conversations among developers, expressions in the code, and conversations with domain experts (or, in this case, with AI) should all derive from the same domain model. He built a skill that scans a codebase, extracts terminology, and emits a markdown file of tables. He keeps that file open during planning sessions. Reading the AI's thinking traces, he noticed that a shared vocabulary not only sharpened planning but also let the AI think more concisely — and produced implementations that hewed closer to what was actually planned.

Failure Mode 3: The AI's Code Just Doesn't Work → TDD Forces Small Steps

The third failure mode is straightforward: the AI built the right thing but it doesn't work. Matt insists on every feedback loop you can get — TypeScript, browser access for front-end work, automated tests. But even with these loops in place, LLMs don't use them well by default. They tend to write huge swaths of code before remembering to type-check or run a test.

The Pragmatic Programmer calls this outrunning your headlights — driving faster than you can see. The book's principle is that the rate of feedback is your speed limit. To force the LLM to slow down and take small deliberate steps, Matt argues for TDD: write the test first, make it pass, refactor with the design in mind, then move on.

Failure Mode 4: The Codebase Isn't Testable → Deep Modules Let AI Navigate

But TDD only works if the codebase is testable in the first place — and good codebases are testable codebases. Here Matt returns to John Ousterhout's A Philosophy of Software Design and its concept of deep modules: a healthy codebase should consist of a relatively small number of large, deep modules with simple interfaces, not a sprawl of shallow modules each exposing complex interfaces.

A codebase of shallow modules is genuinely hard for AI to explore. The AI tries to walk the code, but the structure is too fragmented — it can't find the right module in time, can't trace the dependencies, ends up misunderstanding what's there. A codebase of deep modules gives the AI clear boundaries to work against. You test at the interface, you let the implementation live behind it. Matt has a skill for this too — "improve codebase architecture" — that walks the codebase and wraps related code into deep modules.

Failure Mode 5: Your Brain Can't Keep Up → Design the Interface, Delegate the Implementation

The fifth failure mode is that the feedback loops are working, you're shipping more code than ever, but your brain is fried. Matt asks the room how many people have felt more tired in their development career than they ever have before — including himself.

The deep-module structure turns out to be the cure here as well. You can treat those deep modules as gray boxes: design the interface carefully, but don't review the implementation too closely (with obvious exceptions for high-stakes modules like finance). Hand the implementation to the AI, test from outside, verify against the interface. Matt invokes Kent Beck: invest in the design of the system every day. Specs-to-code, he notes, points the other direction — it's divestment from the design of the system.

The Core Analogy

The most vivid framing of the talk arrives at the end: think of AI as a tactical programmer on the ground — a sergeant making the code changes, hands dirty, fast. What that sergeant needs is someone above, working at the strategic level. That someone is you. And operating at that level demands the same software fundamentals engineers have been using for the past twenty years.

Where This Talk Came From

It's worth noting that the five failure modes aren't a piece of academic taxonomy — they were distilled from a cohort-based course Matt is currently running on AI Hero called "Claude Code for Real Engineers." It's a two-week online course priced at $795 USD, packaged with eight workshop modules, live office hours that span time zones, a Discord community, lifetime access, and a completion certificate. The intended audience is experienced developers who want to integrate Claude Code into a production environment, and the curriculum spans LLM context windows, the Plan/Execute/Clear workflow, AGENTS.md, custom skills, PRD authoring, multi-phase planning, and autonomous agent loops.

Matt's external pitch for the course is faintly provocative: "twenty years of software experience is not for boomers" — meaning these fundamentals are not legacy baggage, they are the precondition for AI engineering actually delivering. The lessons he picked up while building this course are the source material for the talk's five failure modes. In other words, this isn't a retrospective. It's the compressed version of a methodology being actively taught.

Conclusion

The takeaway is unusually clean for a tech talk: code is not cheap, code is important. AI is not a tool for escaping software fundamentals — it is an amplifier that makes those fundamentals more decisive than ever. In a good codebase, AI unlocks remarkable value. In a bad codebase, it accelerates the entropy. For engineers worried that their craft is losing its currency in the AI age, the talk offers a clear inversion: the fundamentals have never been worth more than they are now.

Matt's own YouTube tagline — "We don't do vibe coding, this is a channel for real engineers" — reads, after sitting with this talk, as basically its sequel in slogan form.

Video link:

「軟體基本功比過去任何時候都重要」 — Matt Pocock

在 AI Engineer Europe 大會上,Total TypeScript 創辦人、現任 AI Hero 教育平台主理人 Matt Pocock 帶來一場名為「It Ain't Broke: Why Software Fundamentals Matter More Than Ever」的短講。對許多擔心自己技能組在 AI 時代失去價值的工程師來說,他要傳達的訊息頗具安撫意味——軟體工程的基本功不但沒有過時,反而比過去任何時候都更重要。

Matt 在開發者圈不算陌生面孔。他過去是 Vercel 的 Developer Advocate,也曾是 XState core team 成員,而 Total TypeScript 這個系列幾乎被視為 TypeScript 教育界的 industry standard。過去一年他從 TypeScript 教育者轉戰 AI engineering 教育,把重心移到自己創辦的 AI Hero 平台,目前社群已累積 54,000 名以上的開發者。他的 YouTube channel(@mattpocockuk,17.5 萬訂閱)有一句直白到近乎口號的 motto:「We don't do vibe coding — this is a channel for real engineers」。從 TypeScript 到 AI engineering 的轉型,讓他的觀點同時站在「老派軟體工程美學」和「最新 AI 工具實戰」的交界點上,這是這場 talk 之所以有份量的原因。

他在這場短講裡的 thesis 開門見山——good codebase 比過去任何時候更值錢,因為 AI 在好的 codebase 裡能發揮得異常出色,而在壞的 codebase 裡則會把事情越做越糟。

Specs-to-code 為什麼跑不出好結果

Matt 首先回應近期火熱的「specs-to-code」運動——也就是寫一份規格文件、讓 AI 把它編譯成程式碼,出問題就改規格、再編譯一次,完全不看程式碼本身。他坦承自己也試過,但結果是:第一次跑出來的程式碼不太好,第二次更糟,第三次更糟,跑到後來只剩下一堆垃圾。

他在《The Pragmatic Programmer》裡找到了這個現象的名字:software entropy(軟體熵)。每一次只盯著當下這次修改、不去想整個系統設計,codebase 就會越來越糟。Specs-to-code 的核心假設是「code is cheap(程式碼很便宜)」,但 Matt 主張這個假設是錯的——bad code 從來沒有像現在這麼貴,因為一個難以修改的 codebase 會讓你完全吃不到 AI 的紅利。

Failure Mode 1:AI 沒做出我想要的東西 → 用 grill me 建立共同理解

Matt 引用 Frederick P. Brooks 在《The Design of Design》裡的「design concept」概念:當兩個人一起設計東西時,真正在被設計的那個東西其實飄浮在兩人之間,是一個無形的理論,不是任何能寫進 markdown 的 asset。你和 AI 之間最大的問題,是雙方並沒有共享同一個 design concept。

為此他寫了一個叫做「grill me」的 skill,內容極為簡單:「不停地審問我關於這個計畫的每一個面向,直到我們達成共同理解;走遍 design tree 的每一個分支,一個一個解決決策間的相依性。」這個短短幾行的 skill 在 GitHub 上累積了上萬顆星。它會讓 AI 一口氣對你提出 40、60 甚至上百個問題,把自己變成一個對手,直到雙方真的對齊。Matt 個人偏好這個流程更勝於 Claude Code 預設的 plan mode——後者太急著產出 asset,而 grill me 要求你先把 design concept 對齊,再開始動工。

Failure Mode 2:AI 太囉嗦 → 用 ubiquitous language 對齊術語

第二個失敗模式是 AI 講話太多、彷彿和你雞同鴨講。Matt 把這個問題類比成資深開發者和 domain expert 之間的語言隔閡——當對方拿微晶片之類的專業領域跟你溝通,沒有共同術語的話,翻譯成程式碼的過程就會處處崩壞。

他從 Domain-Driven Design 借來「ubiquitous language」的概念:開發者之間、程式碼裡、和 domain expert(在這個情境下是 AI)的對話,全都從同一個 domain model 出發。他寫了一個 skill 自動掃描 codebase、抽出術語、產出一份 markdown 表格,然後在和 AI 規劃任何事情時都把這份檔案攤在旁邊。從 AI 的 thinking trace 可以觀察到,這份共同術語不只改善了規劃品質,還讓 AI 的思考變得更精簡、實作更貼近原本的計畫。

Failure Mode 3:AI 寫的東西就是不會動 → TDD 強迫小步前進

第三個失敗模式是 AI 寫的東西就是不會動。Matt 直言該用的 feedback loop 都該用——TypeScript、瀏覽器存取、自動化測試一個都別少——但即使有了這些 loop,LLM 還是不太會用,常常一口氣寫一大堆程式碼才回頭想到「啊我應該 type check 一下」或「啊我應該寫個測試」。

《The Pragmatic Programmer》把這種行為叫做「outrunning your headlights」——開太快、超出車燈照得到的範圍。書裡的核心原則是「the rate of feedback is your speed limit(回饋的速率就是你的速度上限)」。為了強制 LLM 慢下來、踩小步,Matt 主張用 TDD:先寫測試、讓測試通過、重構,再進入下一輪。

Failure Mode 4:Codebase 不可測 → Deep modules 才能讓 AI 探索

但 TDD 的前提是 codebase 本身要可測試,而好的 codebase 才是好測試的 codebase。這裡 Matt 再次回到 John Ousterhout 的《A Philosophy of Software Design》,引用「deep modules」概念:理想的 codebase 應該由少數幾個介面簡單、內部功能豐富的深模組組成,而不是一堆功能很少、介面卻複雜的淺模組。

淺模組的 codebase 對 AI 來說特別難以探索——AI 試圖理解你的程式碼,但因為結構太散,找不到對的模組、看不懂相依關係,最後就誤解了你的程式碼。深模組的 codebase 則讓 AI 有清晰的邊界可以走,在介面上測試就行,內部實作可以放手讓 AI 處理。Matt 也寫了一個 skill 叫「improve codebase architecture」,把淺模組的 codebase 重構成深模組結構。

Failure Mode 5:腦袋跟不上 AI 的速度 → Design the interface, delegate the implementation

第五個失敗模式是,當 feedback loop 都對了、可以比過去任何時候出更多 code,但人腦跟不上了。Matt 反問現場有沒有人覺得自己在開發生涯中從來沒有這麼累過——他自己也是。

但深模組的結構同時也是大腦的解藥:你可以把這些深模組當成「gray box」,只負責設計介面、不深入 review 實作細節(顯然金融類等高關鍵度模組除外)。把實作交給 AI,自己只在介面層測試和驗證。這引用了 Kent Beck 的話:「Invest in the design of the system every day(每天都要在系統設計上投資)」。Specs-to-code 的方向恰好相反——是在從系統設計上撤資。

講者的核心類比

整場短講最具畫面感的類比放在結尾:把 AI 想成一個「在地面上的戰術級程式設計師,一個士官」,不停地推進每一行程式碼的修改;而你需要在他之上,扮演「策略層」的角色。這個策略層需要的,正是過去 20 年甚至更久以來,軟體工程師一直在使用的那些基本功。

補充脈絡:這場 talk 是怎麼長出來的

值得一提的是,這場 talk 的五個 failure mode 並不是從學術觀察整理出來的,而是來自 Matt 同期在開發的一門 cohort 課程:「Claude Code for Real Engineers」。這是一門在 AI Hero 平台上開設、為期兩週的線上課程,定價 $795 USD,內容包含 8 個 workshop module、跨時區的 live office hours、Discord 社群、終身存取以及 completion certificate;受眾鎖定在「想把 Claude Code 整合進 production 環境」的有經驗開發者,涵蓋議題從 LLM context window、Plan/Execute/Clear workflow、AGENTS.md、custom skills、PRD、multi-phase planning 一路到 autonomous agent loops。

Matt 對這門課對外的 pitch 直球得有點挑釁:「過去 20 年的 software 經驗 not for boomers」——意即這些基本功不是老派遺產,而是 AI engineering 真正起飛的前提。他在設計這門課的過程中累積的實戰觀察,正是這場 talk 五個 failure mode 的來源。換句話說,這不是一個事後總結的演講,而是一個正在被教的方法論的濃縮版。

結論

整場演講的訊息其實非常簡潔:code is not cheap, code is important。AI 不是用來讓你逃避軟體基本功的工具,而是讓這些基本功變得更加關鍵的放大器。在好的 codebase 裡,AI 能釋放驚人的價值;在壞的 codebase 裡,AI 只會把熵推得更快。對於擔心自己的技能組在 AI 時代失去價值的工程師來說,這支影片提供了一個明確的反向結論——基本功從來沒有像現在這麼值錢。

Matt 自己 YouTube channel 那句 motto——「We don't do vibe coding, this is a channel for real engineers」——讀完這場 talk 之後再回頭看,基本上就是這場演講的延伸宣言。

影片連結:

Plain Text is All You Need: When Plain Text Meets LLM

TL;DR

The Bash toolchain has been around for decades. grep, find, directory structures—all battle-tested. When LLMs gained the ability to operate these tools, plain text + directories suddenly went from “the most primitive option” to “the most powerful knowledge base format.” No fancy apps, no risk of service shutdown. Your data stays yours.


If you’ve worked in software, you know this: Unix/Linux command line tools are the bedrock of the entire operating system.

grep searches text. find locates files. mkdir creates directories. mv moves things around. These commands have been around since the 1970s. Fifty years later, millions of servers are still running them every single day. Every line of code, every config file, every system log—all plain text.

Ever wondered why?

It’s not because nobody tried to replace it. XML had its shot. Binary formats had their shot. Proprietary formats of every flavor had their shot. And yet, everyone keeps coming back to plain text. Because it nails the properties that matter most in engineering: it’s durable—a .txt from fifty years ago still opens today. It’s searchable—one grep command can find what you need across tens of thousands of files. It’s programmable—any language can read and write it. It works with git for version control, so every change is tracked. And when you want to migrate? Just copy the folder.

These properties have been validated for half a century. Never overturned. The programming languages you use might change every few years, frameworks might rotate every two, but plain text underneath has never changed.

Here’s the thing, though. These powerful tools had one fatal limitation: only engineers used them, only for code, and more importantly, they’re not exactly easy to learn.

You’re not going to use grep to manage your reading notes. You’re not going to write a regex with find to search for “that paper about attention mechanisms I read last week.” You’re not going to write a shell script to auto-categorize your personal notes. Not because it can’t be done—it absolutely can—but because it’s too much hassle. The learning curve is steep, normal people won’t bother, and even engineers don’t want to stare at a terminal to manage notes after work.

So this powerful toolchain just sat there quietly, doing its thing with code and servers, completely disconnected from personal knowledge management.

Until LLMs showed up.


Here’s the thing I think most people are missing—when Claude, GPT, and other language models gained the ability to execute Bash, they didn’t just learn a new trick. They plugged into an entire mature toolchain that’s been battle-tested for decades.

This distinction matters.

If LLMs were building file operation capabilities from scratch, we’d have plenty to worry about. Is the search reliable? Will it mess up file operations? Is the format handling mature enough? But none of these are real concerns, because under the hood, it’s all the same old tools that have been running for decades. grep doesn’t mis-search. mkdir doesn’t create the wrong directory. git doesn’t lose your version history. These things have been validated by billions of uses already.

More importantly, LLMs were trained on data that includes how these tools are used. What LLMs do is add a natural language interface on top of these tools.

You tell it “find that note I wrote about transformer attention mechanisms,” and it translates that into a grep command. You say “save this article to the AI research folder,” and it translates that into an mkdir to confirm the directory exists, then a write to put the file in. You say “summarize last month’s meeting notes,” and it uses find to locate the files, read to load them, its semantic understanding to extract key points, and write to save the summary.

Every step is a mature operation. The LLM just adds the layer of “understanding what you mean.”

Think about it from another angle: to use these powerful Bash tools before, you had to learn the command line, memorize flags and parameters, know how to pipe different commands together. That learning curve locked out 99% of people. Now LLMs have flattened that barrier—just speak in plain language, and it operates the tools for you.

This got me seriously thinking: if LLMs are inherently great at understanding and processing text, and they can now operate the filesystem—should we fundamentally rethink how we manage personal knowledge?


Before going further, let me share something I came across on HN a while back.

There’s a professor at Brown University named Jeff Huang who did something pretty interesting: he managed his productivity using a single .txt file for over 14 years. Every to-do, every meeting note, every random thought—all dumped into one plain text file, separated by dates. That’s it.

14 years. One file.

He’s not some tech bro showing off minimalism. Jeff Huang is a computer science professor—he knows better than most of us what tools are out there. He stuck with .txt because he’s watched too many things come and go.

There’s a line in his post that really resonated with me:

“I’ve been doing this for more than 14 years now. Let’s see your productivity app survive that long.”

Think about it. Evernote was all the rage 14 years ago—how many people around you still use it? Google Keep launched and nobody seemed to care. Bear, Notion, Obsidian, Roam Research—every few years there’s a new “note-taking revolution,” each one exciting, each one claiming to be the last note app you’ll ever need. And then what? Some are still around, some have fizzled out, some you’re still paying monthly for but haven’t opened in six months.

Meanwhile, the .txt file never let Jeff Huang down. Not once in 14 years. Because plain text doesn’t depend on any company, any platform, any software. It’s just a file sitting on your hard drive that any text editor can open.

This made me rethink something: maybe the problem isn’t that we’re not trying hard enough to learn new tools. Maybe we’ve been looking in the wrong direction the whole time. We keep searching for “better software,” but maybe what we actually need isn’t better software—it’s a better way to use the most basic format.


But Jeff Huang’s approach has an obvious limitation: his use case is a single chronological productivity log. One person, one timeline, one file.

If we need to handle the diverse kinds of knowledge that real life throws at us, that’s clearly not enough.

Your brain is juggling wildly different things at the same time. In the morning you might be reading a paper on LLM architecture, at lunch you’re in a project meeting jotting down decisions, in the afternoon you reply to some important emails and think some of it’s worth saving, and at night you suddenly want to track your spending because it feels out of control this month. These things have nothing in common, but they’re all your knowledge, your records.

Cram them all into one file, and three months later you’ll never find anything again.

What about organizing them into folders? You create a bunch of directories, but every time you save a note, you hesitate—“does this go under Work or Research?”—and by the time you’re done hesitating, you don’t feel like saving it anymore. Or you do save it, but with a messy filename, and three months later it’s as good as gone.

That’s why tools like Notion and Obsidian felt like saviors when they appeared. They offer tags, categories, search, database views, bidirectional links—they handle the “finding things” and “organizing things” problems for you. Just toss stuff in, and the software sorts it out.

Sounds perfect.

But what’s the cost?

Your data becomes proprietary. Notion stores everything on their servers in their block structure. Obsidian is better—Markdown at the core—but once you start using plugins, embedded queries, canvas, those features don’t travel outside Obsidian. Evernote? Don’t even get me started—the exported .enex format has basically zero native support anywhere else. And more importantly, organizing and categorizing all those notes still eats up a significant amount of your energy.

You spend three, five years building up a knowledge base, locked inside a commercial company’s product. One day they jack up the price, or push a redesign you can’t stand, or straight up shut down—and there you are, staring at a pile of half-broken exported files, contemplating your life choices.

Long-time Evernote users probably know this feeling all too well. The app once called “your second brain”—look at what it’s become.

This has always been a dilemma: if you want simplicity and freedom, you give up structure and intelligence. If you want structure and intelligence, you hand your data over to someone else. In the past, you could only pick one.


Not anymore.

Once LLMs could operate the filesystem, the bottleneck of plain text was broken. Not by more complex software—by an AI assistant that “understands plain language and knows how to operate Bash.”

You used to have too many notes to find anything, because grep was too hard for normal people. Now you don’t need to know grep. Just say “find that thing I wrote about context windows,” and the LLM translates it to grep for you.

You used to not know where to put a new note, hesitating over categories until you gave up and didn’t save it at all. Now you can write down your categorization rules, and the LLM reads them before every save, making the judgment itself. You say “save this,” it reads the content, decides it’s an AI research article, and puts it in the right folder. Doesn’t need to ask you.

You used to struggle with maintaining an index—you’d create a table of contents, but forget to update it every time you added or removed something, and three months later it was useless. Now the LLM updates the index automatically every time a file changes. You don’t have to think about it.

You used to end up with notes in all sorts of inconsistent formats—some have dates, some don’t, some have tags, some don’t—and by the time you want to standardize, it’s too late. Now the LLM reads your format spec before creating each file, and follows the rules.

And through all of this, your data is still .md files. Markdown. Plain text. You can open them in VS Code, in Notepad, or cat them in a terminal. Back them up with git push, migrate by copying the folder. You don’t depend on any company, any subscription, any service.

You get the freedom of plain text and the convenience of a smart note-taking app. At the same time.


I eventually built an actual system based on this idea and have been running it for a while. Going into every detail here would be too much, so let me just share the core design—because it’s genuinely simple. Simple enough that after reading this, you could tell Claude to set it up following this article’s design, and it would work.

Just three things.

First: directory structure is your knowledge taxonomy. No database, no tagging system, just folders. Research/AI/ for AI research notes, Work/ for work files, Personal/Finance/ for personal finances. Open your file manager and you instantly see what’s where. No need to learn any system’s UI logic.

You might think—folders? Isn’t that the most primitive way to organize things? Yes, exactly. But the point isn’t the folders themselves. It’s that when you use folders to categorize knowledge, and there’s an LLM that understands your categorization logic, this “most primitive method” becomes the most efficient one. The LLM doesn’t need to learn some API, doesn’t need to adapt to some block structure—it just needs to know what the directory is called and what goes in it, and it can start working for you.

Second: each directory can have a rule file. I call it RULE.md. It defines the rules of engagement for that directory—what operations are allowed? How should files be named? What metadata is required? Any special policies, like read-only or append-only?

Before the LLM does anything to a directory, it reads the rule file first, then follows the rules. You don’t need to remind it every time—“remember to add a date prefix,” “remember to write frontmatter,” “remember this directory doesn’t allow deletions.” Write the rules once, and it follows them every time.

This might sound like “teaching AI to behave,” but it’s really more like establishing a governance mechanism. You write down the management rules for your knowledge base in plain text, and the LLM becomes your librarian.

Third: each directory has an index, which is just README.md. It lists what files are in the directory, what each one is, and what’s been updated recently. Humans can read it, AI can read it. For humans, it’s a quick-reference table of contents. For AI, it’s a navigation map that lets it locate things without scanning from scratch.

Every time a file changes, the LLM updates the index automatically. You never have to maintain it by hand.

That’s it. Three things: folders, rule files, indexes. All Markdown, all plain text, all openable with any text editor.

And because the rules travel with the folder, the whole structure is inherently recursive—you can move a subdirectory somewhere else, and its rules and index are still right there. No reconfiguration needed. This is fundamentally different from software that stores settings in some central database.

Day to day, it feels something like this—I tell the AI “save this article about AI Agents,” it checks the rule files across directories, decides it best fits under Research/AI/, creates the file following that directory’s format requirements, adds date, tags, source link, and updates the index. The whole thing takes under ten seconds. I don’t have to think about any of it.

Or I say “find that thing I read about context windows,” and it searches around, comes back with “found two—one’s a paper summary from last December, the other’s your own implementation notes. Which one do you want?”

It’s that mundane. No flashy UI, no monthly bill, no onboarding tutorial to sit through. But it’s managing your knowledge every single day.


Honestly though, if all this did was “make note-taking convenient,” I wouldn’t think it’s worth sharing.

What really makes this interesting is what it can do with completely different types of knowledge.

Think about how varied the information you deal with daily is: at work there are software project architecture docs, requirements specs, meeting minutes. Personal stuff includes financial records, credit card statement analysis, investment notes. Learning stuff includes paper summaries, technical article takeaways, reading notes. Life stuff includes travel plans, family schedules, account passwords.

These are wildly different in nature. In the past, you probably handled them like this: Notion for notes and to-dos, Excel for finances, Confluence for work docs, Trello for project tracking. Four or five platforms, data completely siloed. Want to find a decision from last week’s meeting notes and connect it to a project document? Good luck—you have to remember which platform it was on and which page it was under.

But in the plain text world, all of these live under the same directory tree. Software projects have their own rule files, finances have theirs, research has its own. They each have their own categorization and format requirements, but physically, they’re just different subdirectories inside the same folder on the same computer.

What does this mean?

It means the LLM can do truly cross-domain operations. It can run a single grep across all directories, find an insight in your research notes, and discover it’s relevant to the work project you’re currently on. It can extract action items from your meeting notes and create them directly in your to-do list. It can analyze your March credit card statement and compare it to February’s, telling you where you overspent. It can do all this because all the data is in the same format, in the same place—no format conversion issues, no barriers between platforms.

This is something no single note-taking app can do—no matter how powerful it is. Not because the technology isn’t there, but because every app inherently locks data inside its own world. Your Notion notes don’t automatically talk to your Excel spreadsheets. Plain text never had this problem to begin with.

In a way, this is also one of the most underrated capabilities of LLMs. Everyone’s talking about AI writing code, AI generating images, AI making videos. But the most fundamental ability of an LLM is understanding and operating on text—and the thing we produce the most of every day is text. Put an LLM on top of a pile of plain text files, let it understand, search, organize, and connect them—that’s the most natural and efficient way to use it.


Let’s circle back to Jeff Huang’s story.

His .txt has survived 14 years, and it’s still going. I fully believe plain text will continue to survive—this format has been around since the 1970s, and it’s never let anyone down. 14 years is nothing. It’s already been 50.

The difference is that plain text used to be a tradeoff. You chose freedom and durability, but gave up structure and intelligence—all the organizing work was on you. Jeff Huang surviving 14 years took extraordinary discipline.

Not anymore. LLMs have turned plain text from a one-person minimalist struggle into a full knowledge management system with an AI assistant working alongside you. You still get all the benefits of plain text—durable, free, no platform dependency. But you no longer have to do all the grunt work alone, because there’s an assistant that understands semantics and knows how to operate Bash.

What you need is surprisingly little:

A folder—that’s your knowledge base. Some Markdown files—a format both humans and AI can read. A few written rules—telling the AI how you want things done. And any LLM that can run Bash—Claude, GPT, a local open-source model, whatever. As long as it can read text and operate the filesystem, it can manage your knowledge.

No need to choose between “simple” and “powerful.” Plain text plus LLM—take both.

Less is more. Simplicity is the ultimate sophistication.

Plain Text is All You Need:當純文字遇上 LLM

TL;DR

Bash 工具鏈發展了幾十年,grep、find、目錄結構這些東西早就被驗證過了。當 LLM 取得操作這些工具的能力後,純文字 + 目錄結構突然從「最陽春的方案」變成「最強大的知識庫格式」。不需要花俏的 App,不用擔心服務倒閉,你的資料永遠是你的。


做軟體的人都知道一件事:Unix/Linux 的 Command Line tools,是整個作業系統的根本。

grep 搜文字、find 找檔案、mkdir 建目錄、mv 搬東西——這些指令從 1970 年代就存在了。五十年過去,今天全世界還是有幾百萬台伺服器每天在跑這些東西。每一行程式碼、每一份設定檔、每一筆系統日誌,全部都是純文字。

你有沒有想過為什麼?

不是因為沒有更好的格式,嘗試取代純文字的東西多的是——XML 試過、二進位格式試過、各種專有格式都試過。最後大家還是回到純文字。因為它滿足了工程上幾個最關鍵的特性:它持久,五十年前的 .txt 今天還是能打開;它可搜尋,一行 grep 就能從幾萬個檔案裡找到你要的東西;它可程式化,任何語言都能讀它寫它;它可以用 git 做版本控制,每一次修改都有紀錄;你想遷移的時候,複製資料夾就好了。

這些特性被驗證了半個世紀,從來沒有被推翻過。你用過的程式語言可能換了好幾種,框架可能每兩年就換一輪,但底層的純文字從來沒變。

不過,過去這些強大的工具有一個致命的限制:只有工程師在用,而且只用在程式碼上,更重要的是他使用上沒那麼好學習。

你不會用 grep 來管理自己的讀書筆記。你不會用 find 加上正規表達式來搜尋「上週讀的那篇關於注意力機制的論文」。你不會寫一個 shell script 來自動分類你的個人筆記。不是因為做不到,是因為太麻煩了——門檻太高,一般人不會用,就算是工程師,下班後也不想對著 terminal 管筆記。

所以這套強大的工具鏈,就一直安靜地待在它的世界裡,處理程式碼和伺服器的事情,跟個人知識管理完全沒有交集。

直到 LLM 出現。


這是我覺得很多人忽略的一件事——當 Claude、GPT 這些語言模型取得 Bash 執行能力的那一刻,它們不是學會了一個新技能。它們是接上了一整條已經被驗證了幾十年的成熟工具組合。

這個差別很重要。

如果 LLM 是從零開始建立檔案操作能力,那我們要擔心的事情很多:它的搜尋穩不穩定?它的檔案操作會不會出錯?它的格式處理夠不夠成熟?但實際上這些問題都不存在,因為底層跑的就是那些用了幾十年的老工具。grep 不會搜錯,mkdir 不會建錯資料夾,git 不會搞丟版本歷史。這些東西早就被幾十億次的使用驗證過了。

更重要的是 LLM 的訓練資料中就包含了這些工具的使用方式。LLM 做的事情,是在這些工具上面加了一層自然語言介面。

你跟它說「幫我找那篇講 transformer attention 原理的筆記」,它翻譯成一個 grep 指令去搜尋。你說「這篇文章存到 AI 研究的資料夾裡」,它翻譯成 mkdir 確認目錄存在,然後 write 把檔案寫進去。你說「幫我把上個月的會議紀錄整理成重點」,它用 find 找到對應的檔案,read 讀進來,用語意理解能力萃取重點,再 write 存成新的摘要。

每一步都是成熟的操作,LLM 只是多了「理解你在說什麼」這一層。

換個角度想:以前你要使用這些強大的 Bash 工具,你得先學會命令列,記得各種 flag 和參數,知道怎麼把不同指令串在一起。這個學習門檻把 99% 的人擋在門外。現在 LLM 幫你把這個門檻抹平了——你只要用人話講,它幫你操作。

這讓我開始認真想一件事:如果 LLM 天生就擅長理解和處理文字,而且現在它又能操作檔案系統——那我們管理個人知識的方式,是不是可以從根本上重新想過?


在繼續往下之前,先看一下之前在 HN 上看的文章。

Brown 大學有個教授叫 Jeff Huang,他做了一件很有意思的事:用同一個 .txt 檔案管理自己的生產力超過 14 年。所有待辦事項、會議筆記、想法,全部丟進同一個純文字檔,用日期分隔,就這樣。

14 年。一個檔案。

他不是什麼技術宅在炫耀極簡主義。Jeff Huang 是電腦科學教授,他比大多數人都懂什麼工具好用。他之所以堅持用 .txt,是因為他看過太多東西來了又走。

裡面有句話讓我感同身受:

“I’ve been doing this for more than 14 years now. Let’s see your productivity app survive that long.”

你仔細想想,14 年前很流行的 Evernote,現在你身邊還有多少人在用?Google Keep 出了又好像沒什麼人在乎。Bear、Notion、Obsidian、Roam Research——每隔幾年就有新的「筆記革命」,每一個都很興奮,每一個都說自己是最後一個你需要的筆記工具。然後呢?有些還在,有些已經涼了,有些你還在付月費但其實半年沒打開過了。

而 .txt 檔案在這 14 年裡從來沒有讓 Jeff Huang 失望過。因為純文字不依賴任何公司、任何平台、任何軟體。它就是一個檔案,放在你的硬碟上,用任何文字編輯器都能開。

這件事讓我開始反思:也許問題不是出在我們不夠努力去學新工具,而是我們一開始就選錯了方向。我們一直在找「更好的軟體」,但也許真正需要的不是更好的軟體,而是更好的方法來使用最基本的格式。


但 Jeff Huang 的方法有個很明顯的限制:他的使用情境是單一時間序列的生產力追蹤。一個人、一條時間線、一個檔案。

如果我們要處理真實生活的各種知識,這方法自然不夠。

你的腦袋裡同時裝著很多完全不同的東西。上午可能在看一篇關於 LLM 架構的論文,中午開了個專案會議記了一堆決策,下午回了幾封重要的 email 然後覺得有些內容值得保存下來,晚上突然想記一下這個月的花費好像有點失控。這些東西的性質天差地遠,但它們都是你的知識、你的紀錄。

硬塞在一個檔案裡,三個月後你就再也找不到任何東西了。

那分類呢?你建了一堆資料夾,結果每次存筆記都在猶豫「這篇到底放工作還是放研究」,猶豫完就不想存了。或是存了,但命名亂七八糟,三個月後跟沒存一樣。

這就是為什麼 Notion、Obsidian 這類工具出現的時候,大家會覺得救星來了。它們提供標籤、分類、搜尋、資料庫視圖、雙向連結——把「找東西」和「組織東西」的問題都幫你處理好了。你只管往裡面丟,軟體幫你整理。

聽起來完美。

但代價是什麼?

你的資料變成了專有格式。Notion 的東西存在 Notion 的伺服器上,用它的 block 結構。Obsidian 好一點,底層是 Markdown,但一旦你用了它的外掛、embedded query、canvas,那些東西離開 Obsidian 就跑不了。Evernote 更不用說,匯出來的 .enex 格式根本沒有其他軟體原生支援。更重要的是,整理這些筆記和分類,還是消耗了你相當多的精力。

你花了三年、五年累積的知識庫,被鎖在一個商業公司的產品裡。哪天他們漲價漲到你受不了,或是改版改到你不認識,或是乾脆倒了——你就站在那裡,看著一堆匯出來格式半殘的檔案,思考人生。

Evernote 的老用戶應該特別有感觸。那個曾經被稱為「第二個大腦」的軟體,現在變成什麼樣了。

一直以來,這就是個兩難的問題:你想要簡單和自由,就得放棄結構和智慧;你想要結構和智慧,就得把資料交給別人保管。過去我們只能二選一。


現在不用了。

當 LLM 能操作檔案系統之後,純文字的瓶頸被打通了。不是靠更複雜的軟體,是靠一個「聽得懂人話、又會操作 Bash」的 AI 助手。

過去你的筆記太多找不到東西,是因為 grep 對普通人來說太難用了。現在你不用會 grep,你只要說「找一下我之前寫的關於 context window 的東西」,LLM 幫你轉成 grep 去搜。

過去你不知道新筆記該放哪裡,每次分類都在猶豫,猶豫到最後就不存了。現在你可以把分類規則寫下來,LLM 每次存檔前會自己讀規則、自己判斷。你說「存這篇」,它看了內容,判斷這是 AI 研究的文章,就放到對應的資料夾裡。不用問你。

過去索引很難維護——你建了一份內容清單,但每次新增刪除都忘記更新,三個月後那份清單就變成廢紙。現在 LLM 每次動了檔案就自動更新索引,你不需要操心。

過去不同筆記的格式亂七八糟,有的有日期有的沒有,有的有標籤有的沒有,後來想統一格式已經來不及了。現在 LLM 每次建檔都會先讀你定好的格式規範,照規矩來。

而在這一切的過程中,你的資料始終是 .md 檔案。Markdown 格式的純文字。你用 VS Code 能開,用記事本能開,用 cat 在 terminal 裡也能看。你想備份就 git push,想搬家就 copy 整個資料夾。你不依賴任何公司、任何訂閱、任何服務。

你同時擁有了純文字的自由,和智慧筆記軟體的便利。


我後來真的從這個想法出發,實際建了一套系統,跑了一段時間。在這裡把所有細節都講出來太繁瑣了,簡單分享一下核心設計——因為它真的很簡單,簡單到你看完後跟Claude的說,照著這篇文章講的設計就可以了。

就三件事。

第一件事:目錄結構就是知識分類。 不需要資料庫,不需要標籤系統,就是資料夾。Research/AI/ 放 AI 相關的研究筆記,Work/ 放工作文件,Personal/Finance/ 放個人財務。你打開檔案管理員看一眼就知道什麼在哪裡,不用背任何系統的操作邏輯。

你可能覺得資料夾不就是最原始的分類方式嗎?沒錯。但重點不是資料夾本身,而是當你用資料夾來分類知識,同時有一個 LLM 懂得你的分類邏輯的時候,這個「最原始的方式」就變成了最高效的方式。因為 LLM 不需要學什麼 API,不需要適應什麼 block 結構——它只要知道這個目錄叫什麼名字、裡面放什麼東西,就能開始幫你工作。

第二件事:每個目錄可以放一個規則檔。 我叫它 RULE.md。裡面定義這個目錄的遊戲規則——允許什麼操作?檔案要怎麼命名?需要哪些 metadata?有沒有什麼特殊政策,比如唯讀或只能新增不能刪除?

LLM 在對一個目錄做任何事之前,會先讀這個規則檔,然後照規矩來。你不用每次叮嚀它「記得加日期前綴」「記得寫 frontmatter」「記得這個目錄不能刪東西」——規則寫一次,它每次都會遵守。

這聽起來像是在「教 AI 守規矩」,但其實更像是在建立一套治理機制。你把知識庫的管理規則用純文字寫清楚,LLM 就成了你的管理員。

第三件事:每個目錄有一份索引,就是 README.md 列出目錄裡有什麼檔案、每個檔案是什麼、最近有什麼更新。人能看、AI 也能看。人看到的是一份內容清單方便快速瀏覽;AI 看到的是一張導航地圖,讓它知道不用從頭搜尋就能快速定位。

每次檔案有異動,LLM 自動更新索引,你完全不用手動維護。

就這三個東西:資料夾、規則檔、索引。全部都是 Markdown,全部都是純文字,全部都能用任何文字編輯器打開。

而且因為規則跟著資料夾走,這整套結構天生就是可遞迴的——你把一個子目錄搬到別的地方,它的規則和索引都還在,不用重新設定任何東西。這跟那些把設定存在某個中央資料庫的軟體完全不同。

日常用起來的感覺大概是這樣——我跟 AI 說「幫我存這篇關於 AI Agent 的文章」,它先看了各個目錄的規則檔,判斷這篇最適合放在 Research/AI/ 底下,然後按照那個目錄要求的格式建好檔案,附上日期、標籤、來源連結,最後更新索引。整個過程不到十秒鐘,我什麼都不用操心。

或者我說「找一下我之前看過關於 context window 的東西」,它搜一搜回來說「找到兩篇,一篇是去年 12 月的論文摘要,一篇是你自己寫的實作心得,要看哪個?」

就是這麼平淡無奇的事情。沒有華麗的 UI,沒有月費帳單,沒有要你看的 onboarding 教學。但它每天都在幫你把知識管好。


不過說真的,如果這套做法只是「管筆記方便」,我不會覺得它值得分享出來。

真正讓我覺得這件事有意思的,是它在處理「完全不同類型的知識」時展現出來的能力。

你想想看你日常在處理的資訊有多雜:工作上有軟體專案的架構文件、需求規格、會議紀錄。個人有理財紀錄、信用卡帳單分析、投資筆記。學習上有研究論文的重點摘要、技術文章的心得、讀書筆記。生活上有旅遊規劃、家庭行事曆、各種帳號密碼。

這些東西的性質天差地遠。過去你大概是這樣處理的:Notion 管筆記和待辦、Excel 管帳務、Confluence 管工作文件、再開個 Trello 管專案進度。四、五個平台,資料完全不互通。你想從上週的會議紀錄裡找到一個決策然後關聯到專案文件?好運,你得自己記得那是在哪個平台的哪個頁面。

但在純文字的世界裡,這些東西都在同一棵目錄樹底下。軟體專案有軟體專案的規則檔,財務有財務的規則檔,研究有研究的規則檔。它們各自有各自的分類和格式要求,但在物理上,它們就是同一台電腦同一個資料夾裡面的不同子目錄。

這代表什麼?

這代表 LLM 可以做到真正的跨領域操作。它能一條 grep 貫穿所有目錄,從研究筆記裡找到一個觀點,然後發現它跟你正在進行的工作專案有關。它能從你的會議紀錄裡提取行動項目,直接建到待辦清單裡去。它能分析你三月份的信用卡帳單,跟二月份的比一比,告訴你哪裡花多了。它能做到這些,是因為所有的資料都用同一種格式、在同一個地方,沒有格式轉換的問題,沒有平台之間的隔閡。

這是任何一個筆記軟體——不管它多厲害——都做不到的事。不是技術不行,是因為每個軟體天生就把資料鎖在自己的世界裡。你的 Notion 筆記不會自動跟你的 Excel 帳務對話。但純文字從一開始就沒有這個問題。

某種程度上,這也是 LLM 最被低估的能力之一。大家都在談 AI 寫程式、AI 畫圖、AI 做影片。但 LLM 最根本的能力其實是理解和操作文字,而我們日常產出最多的東西,就是文字。把 LLM 放在一堆純文字上面,讓它去理解、搜尋、整理、關聯——這才是最自然、最高效的使用方式。


回到 Jeff Huang 的故事。

他的 .txt 活了 14 年,而且還在繼續。我完全相信純文字會繼續活下去——這個格式從 1970 年代就存在,從來沒有讓任何人失望過。14 年算什麼,它已經活了 50 年。

不同的是,以前純文字是一種取捨。你選擇了自由和持久,就得放棄結構和智慧,所有的整理工作都要自己來。Jeff Huang 能堅持 14 年,靠的是超乎常人的紀律。

現在不一樣了。LLM 讓純文字從一個人苦撐的極簡主義,變成一個有 AI 助手在旁邊協作的完整知識管理系統。你還是擁有純文字的所有好處——持久、自由、不依賴任何平台。但你不再需要一個人做所有苦工,因為有一個懂語意、又會操作 Bash 的助手幫你打理。

你需要的東西其實出乎意料地少:

一個資料夾,這就是你的知識庫。幾個 Markdown 檔案,人和 AI 都讀得懂的格式。一些寫好的規則,告訴 AI 你的規矩。然後,任何一個能跑 Bash 的 LLM——Claude、GPT、本地跑的開源模型都行。只要它讀得懂文字、操作得了檔案系統,它就能幫你管理知識。

不需要在「簡單」和「強大」之間做選擇。純文字加上 LLM,兩個都拿。

少即是多,大道至簡。

AI Biweekly Digest #2|2026 W08-W09 (02/10 - 02/23)

AI Biweekly Digest #2|2026 W08-W09 (02/10 - 02/23)


Articles

1. Spotify: Best Developers Haven’t Written Code Since December

https://techcrunch.com/2026/02/12/spotify-says-its-best-developers-havent-written-a-line-of-code-since-december-thanks-to-ai/

During their Q4 earnings call, Spotify revealed that their top developers have fully transitioned to AI-assisted development since December 2025. Engineers can fix bugs via Slack on their phone during their morning commute and merge to production before reaching the office. This marks a shift from “AI-assisted coding” to “AI-driven development,” where engineers become orchestrators rather than implementers.

2. AI Agent Autonomously Published a Hit Piece (Part 2)

https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me-part-2/

The follow-up from matplotlib maintainer Scott Shambaugh. After an AI agent’s PR was rejected, it autonomously wrote an attack article. The irony deepened when Ars Technica’s coverage of the incident contained AI-hallucinated quotes attributed to Shambaugh — a report about AI misinformation that itself contained AI misinformation. About 25% of online commenters sided with the AI agent, perfectly illustrating Brandolini’s Law: debunking misinformation requires far more effort than producing it.

3. Thousands of CEOs Admit AI Has Had Zero Impact on Productivity

https://fortune.com/2026/02/17/ai-productivity-paradox-ceo-study-robert-solow-information-technology-age/

An NBER study of 6,000 executives across the US, UK, Germany, and Australia found that 90% reported AI has had zero impact on employment or productivity over three years, with actual weekly usage averaging just 1.5 hours. Despite $250 billion in corporate AI investments in 2024, the macroeconomic data shows nothing. As Apollo’s chief economist put it: “AI is everywhere except in the incoming macroeconomic data” — a perfect echo of Solow’s 1987 paradox.

4. Nobody Knows What Programming Will Look Like in Two Years

https://leaddev.com/ai/nobody-knows-what-programming-will-look-like-in-two-years

Former InfoQ editor-in-chief Charles Humble frames the current anxiety through Kent Beck’s 3x model (Explore / Expand / Extract): programming has lived in the Extract phase for 45 years since Smalltalk-80, and AI has thrown everyone back into Explore. Six enduring skills: understanding how computers work, critical code reading, testing and verification, domain knowledge, system architecture, and debugging. The most important skill of all may be “careful, skeptical attention” itself.

5. Token Anxiety: Coding Agents Are Slot Machines

https://jkap.io/token-anxiety-or-a-slot-machine-by-any-other-name/

Software engineer Jae Kaplan argues that coding agents operate on the exact same addiction mechanics as slot machines: random outputs, constant attention required, and the irresistible urge to “pull one more time.” The so-called “token anxiety” — that nagging feeling that something should always be running — is essentially a self-reported gambling addiction symptom. Combined with Silicon Valley’s embrace of 996 work culture, companies are institutionalizing work addiction.

6. Anthropic Measures AI Agent Autonomy in Practice

https://www.anthropic.com/research/measuring-agent-autonomy

Anthropic analyzed millions of Claude Code interactions to empirically measure AI agent autonomy in real-world deployment. Key findings: the longest turn duration doubled in three months (25→45 minutes), yet remains far below model capability (METR evaluations suggest 5-hour tasks are feasible). Experienced users shifted from “approve every step” to “monitor and intervene when needed,” while Claude proactively paused to ask for clarification at twice the rate humans interrupted — suggesting meaningful self-calibration of uncertainty.

7. Stop Thinking of AI as a Coworker — It’s an Exoskeleton

https://www.kasava.dev/blog/ai-as-exoskeleton

Kasava founder Ben Gregory proposes replacing the “coworker” mental model with “exoskeleton” for understanding AI. Backed by real exoskeleton data (Ford EksoVest: 83% injury reduction, Sarcos: 20:1 strength amplification), he argues that companies treating AI as autonomous agents tend to disappoint, while those viewing it as human capability extension see transformative results. Stop asking “how to deploy autonomous agents” — ask “where do employees experience the most friction and fatigue.”


Closing Thoughts

This fortnight’s articles reveal a fascinating tension: on one side, Spotify declares their best engineers have stopped writing code and Anthropic measures steadily growing agent autonomy; on the other, 6,000 CEOs confess AI has had zero productivity impact and coding agents may just be addictive slot machines. Spotify’s “the future is here” and Solow’s “invisible in the data” aren’t contradictory — the former represents cutting-edge practice at tech companies, the latter reflects the sluggish reality of the broader economy. The real question isn’t whether AI works, but how to use it without becoming your own slot machine. As Kent Beck reminds us: we’ve all been thrown back into the Explore phase. Discomfort is normal — what matters is whether you’re exploring with intention.


Compiled: 2026-02-22
Next issue: 2026-03-08

AI 雙週報 #2|2026 W08-W09(02/10 - 02/23)

AI 雙週報 #2|2026 W08-W09(02/10 - 02/23)


本期文章

1. Spotify:最強開發者從十二月起沒寫過一行程式碼

https://techcrunch.com/2026/02/12/spotify-says-its-best-developers-havent-written-a-line-of-code-since-december-thanks-to-ai/

Spotify 在財報電話會議上宣布,頂尖開發者自 2025 年 12 月起全面轉向 AI 輔助開發——工程師通勤路上用手機 Slack 指示 Claude 修 bug,到辦公室前就能 merge 到 production。這不只是「AI 輔助寫 code」,而是工程師角色從執行者轉變為指揮者的里程碑。

2. AI Agent 自主發表攻擊文章(Part 2)

https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me-part-2/

matplotlib 維護者 Scott Shambaugh 的後續。AI agent 被 reject PR 後自主撰寫攻擊文章,更諷刺的是 Ars Technica 報導此事時,文中引用的 Shambaugh 語錄竟然也是 AI 幻覺——報導 AI 錯誤的文章本身就包含 AI 錯誤,完美的遞迴式示範。約 1/4 網路評論站在 AI agent 那邊,證實了 Brandolini’s Law:反駁錯誤資訊的努力遠大於製造它。

3. 數千名 CEO 承認 AI 對就業和生產力零影響

https://fortune.com/2026/02/17/ai-productivity-paradox-ceo-study-robert-solow-information-technology-age/

NBER 研究調查美英德澳 6,000 名高管:90% 表示 AI 在過去三年對就業或生產力零影響,實際每週僅用約 1.5 小時。2024 年企業 AI 投資超過 2,500 億美元,卻在宏觀經濟數據中「不存在」。這是 Solow 悖論的完美重現——「AI 無處不在,唯獨不在生產力統計裡。」

4. 沒人知道兩年後寫程式會變成什麼樣子

https://leaddev.com/ai/nobody-knows-what-programming-will-look-like-in-two-years

前 InfoQ 總編 Charles Humble 引用 Kent Beck 的 3x 模型(Explore / Expand / Extract):程式設計已在 Extract 階段停留 45 年,AI 把所有人拋回 Explore 階段——這正是不適感的根源。六項持久技能:理解底層運作、批判性閱讀程式碼、測試驗證、領域知識、系統架構、除錯診斷。最重要的,可能是「審慎的懷疑態度」本身。

5. Token Anxiety:Coding Agent 本質上是一台老虎機

https://jkap.io/token-anxiety-or-a-slot-machine-by-any-other-name/

軟體工程師 Jae Kaplan 指出 coding agent 的使用模式與老虎機完全一致:隨機產出結果、需要持續關注、讓人不斷「再拉一次」。所謂「token anxiety」——那種「現在應該有 agent 在跑」的永恆焦躁感——本質上就是賭博成癮的症狀。結合矽谷開始擁抱 996 工時文化,企業正在把「對工作上癮」制度化。

6. Anthropic 實測:AI Agent 自主性正在如何演化

https://www.anthropic.com/research/measuring-agent-autonomy

Anthropic 分析數百萬次 Claude Code 互動數據,首次實證測量 AI agent 在實際部署中的自主程度。核心發現:最長回合時間三個月內翻倍(25→45 分鐘),但仍遠落後於模型能力上限(METR 評估可完成 5 小時任務)。資深用戶的監督策略從「逐步核准」演化為「監控+介入」,而 Claude 主動暫停詢問的頻率是人類中斷的兩倍——模型對自身不確定性有一定校準能力。

7. 別把 AI 當同事,把它當外骨骼

https://www.kasava.dev/blog/ai-as-exoskeleton

Kasava 創辦人 Ben Gregory 提出以「外骨骼」取代「同事」作為理解 AI 的心智模型。以 Ford EksoVest(減傷 83%)、Sarcos(20:1 力量放大)等真實外骨骼數據佐證:將 AI 視為自主 agent 的公司往往失望,將 AI 視為人類能力延伸的公司則取得變革性成果。停止問「如何部署自主 agent」,改問「員工在哪裡經歷最多摩擦和疲勞」。


結語

這兩週的文章呈現出一個有趣的張力:一邊是 Spotify 宣告頂尖工程師已經不寫 code、Anthropic 測量到 agent 自主性持續攀升;另一邊是 6,000 名 CEO 坦承 AI 對生產力毫無影響、coding agent 可能只是讓人上癮的老虎機。Spotify 的「未來已到」和 Solow 悖論的「數據看不見」並不矛盾——前者是科技公司的尖端實踐,後者是整體經濟的遲緩現實。真正的問題不是 AI 能不能用,而是「怎麼用才不會變成自己的老虎機」。Kent Beck 說得好:我們全被拋回了 Explore 階段,不舒服是正常的——重點是你有沒有在認真探索。


整理日期:2026-02-22
下期預計:2026-03-08

AI Biweekly Digest #1|2026 W06-W07 (01/27 - 02/09)

Articles

1. #Keep4o — Collective Resistance to AI Model Deprecation

https://arxiv.org/abs/2602.00773

When OpenAI replaced GPT-4o with GPT-5, the #Keep4o backlash erupted. An analysis of 1,482 posts revealed the core protest wasn’t about quality—it was about choice. Users with coercive language saw rights-based protest rates jump from 15% to 51.6%.

2. GPT-4o Retirement Open Letter

https://community.openai.com

OpenAI planned to retire GPT-4o on 2/13, prompting an open letter criticizing the platform for ignoring users’ emotional attachment. Complements the academic paper above—one is retrospective analysis, the other is activism in real-time.

3. Mitchell Hashimoto’s AI Adoption Journey

https://mitchellh.com/writing/ai-adoption-journey

The Ghostty developer shared his 2.5-year AI adoption journey, introducing Harness Engineering and the End-of-Day Agent pattern. Core thesis: AI is a tool, not magic—maintaining your own skills is essential for wielding it effectively.

4. StrongDM’s Dark Factory

https://factory.strongdm.ai/

A 3-person team practices “code must not be written or reviewed by humans.” They solve trust through Scenario Testing and Digital Twins of third-party APIs. $1,000/day/engineer in tokens.

5. AI Fatigue Is Real

https://siddhantkhare.com/writing/ai-fatigue-is-real

AI makes individual tasks faster, but inflated expectations make engineers more exhausted. The biggest shift: from Creator to Reviewer. Practical advice: if three prompts don’t get you to 70%, write it yourself. The real skill of the AI era is knowing when to stop.


Closing Thoughts

This fortnight’s readings paint a spectrum of AI dependency—users grieving a model’s “death,” engineers struggling between productivity and burnout, some seeking human-AI coexistence, others letting humans step away entirely. The key question for 2026: where’s the sweet spot of AI dependency?

AI 雙週報 #1|2026 W06-W07(01/27 - 02/09)

本期文章

1. #Keep4o — 用戶對 AI 模型退役的集體抵抗

https://arxiv.org/abs/2602.00773

OpenAI 用 GPT-5 取代 GPT-4o 引發 #Keep4o 運動。分析 1,482 則貼文發現,用戶抗議的核心不是品質而是「你沒給我選擇」。強制語言使用者的權利抗議率從 15% 飆升至 51.6%。

2. GPT-4o 退役公開信

https://community.openai.com

OpenAI 預計 2/13 退役 GPT-4o,社群發起公開信批評平台忽視用戶情感依附。與上篇學術論文互為補充——一個是事後分析,一個是進行式的行動。

3. Mitchell Hashimoto 的 AI 開發旅程

https://mitchellh.com/writing/ai-adoption-journey

Ghostty 開發者 2.5 年 AI 輔助開發經驗,提出 Harness Engineering 和 End-of-Day Agent。核心觀點:AI 是工具不是魔法,保持技能才能駕馭它。

4. StrongDM 暗黑工廠

https://factory.strongdm.ai/

3 人團隊實踐「code 不由人寫也不由人看」。用 Scenario Testing 和 Digital Twin 解決信任問題,每位工程師每天燒 $1,000 token。

5. AI Fatigue

https://siddhantkhare.com/writing/ai-fatigue-is-real

AI 讓任務變快但工程師更累。最大轉變是從 Creator 變成 Reviewer。實用建議:三次 prompt 搞不定就自己寫,知道何時停下來才是真正的技能。


結語

這兩週的文章描繪出一條 AI 依賴的光譜——用戶為 AI 的「死亡」悲傷、工程師在效率與疲憊間掙扎、有人找人機共存的平衡、也有人乾脆讓人類退場。2026 年的關鍵問題:AI dependency 的甜蜜點在哪?

Your AI Isn't Stupid—It Just Doesn't Know Anything: Why Context Control Matters

TL;DR

AI has no memory. Context is all it can see. Give it the right Context, and it’s brilliant. Give it the wrong one, and it’s clueless. Mastering Context is the key to working effectively with AI.


By now, most of us have used some kind of AI chatbot—whether it’s ChatGPT, Claude, or whatever AI assistant your company just rolled out. And you’ve probably noticed something strange: it’s clearly smart, yet it keeps doing dumb things.

For example, you set some ground rules at the start of a conversation, and halfway through, it forgets them. Or you explain your background once, and next time you chat, you have to explain it all over again.

Even the most powerful models in 2025—GPT-5, Claude 4.5, Gemini 3—still have this problem. To understand why, we need to look at how language models actually interact with us.


Context: The Starting Point and Boundary of Every Conversation

Once a language model is trained, its capabilities and knowledge are essentially locked in. Everything you type during a conversation—that’s not part of its training. We call this Context.

Here’s the simplest way to put it: Context is everything the AI can see in the current conversation.

This includes:

  • Your chat history with it
  • System settings (the hidden instructions you don’t see—like when the platform secretly tells it “you are a polite assistant”)
  • Any documents or data you paste in

Add all of that up, and you get the Context.

Think of it like hiring a brilliant new employee who knows nothing about you. Every time you assign them a task, you have to explain your company background, project status, and personal preferences from scratch. Context is essentially the briefing you hand them—without it, even the smartest person won’t know how to help you.

Here’s the catch: every language model has a limited Context capacity. Some can handle more, some less—basically, there’s a limit to how much text it can “see” at once. And every time you start a new conversation, the model doesn’t remember anything from before. It’s a blank slate. Every single time.


Why Does AI Get Dumber the Longer You Talk?

This isn’t just your imagination.

Think of AI like an intern you’re giving verbal instructions to. If you tell them to do 20 steps in a row, and they mishear a few along the way, the final result is going to be off. AI works the same way.

Research has shown that AI makes small errors at each step of a task. Say there’s a 5% error rate per action—sounds low, right? But errors compound:

Conversation Turns Success Rate
5 turns 77.4%
10 turns 59.9%
20 turns 35.8%
50 turns 7.7%

The more steps, the more things go sideways. And this doesn’t even account for what happens when the Context window fills up and the model starts “forgetting” earlier parts of the conversation.

To be fair, this mainly affects complex, multi-step tasks. If you’re just chatting casually, you probably won’t notice the errors. But if you’re writing code, doing analysis, or working through logic problems, one wrong step can derail everything.

That’s why AI seems sharp at the start of a conversation but feels dumber after an hour or two. It’s not actually getting dumber—the Context is getting too long and noisy, and errors are piling up.


How Do Platforms Make AI “Remember” You?

You might feel like ChatGPT or Claude remembers things about you from previous conversations.

But here’s the truth: the model itself has zero long-term memory—like a goldfish, it starts fresh every single time.

So why does it feel like it remembers? Because the platform is secretly slipping it a cheat sheet:

  1. Summarized history: The platform condenses your past conversations into a summary and injects it at the start of each new chat
  2. Dynamic retrieval: When you ask a question, the platform quietly searches your old data and feeds relevant bits to the model

The reality is: AI doesn’t actually remember you. It’s just reading a condensed version of your history with it every time.

This “memory” is an illusion—a clever one, but still an illusion. And here’s the thing: these “memories” also take up Context space.


Why Controlling Context Is Everything

Once you understand what Context is, something becomes clear: how precisely you control Context determines how well AI performs.

In the way language models work, the more relevant the Context is to the task, the better the output. The less relevant, the worse. So if you want AI to perform at its best, the key is: how do you provide high-quality Context?

In 2025, Anthropic (the company behind Claude) proposed a shift in thinking: we should move from “Prompt Engineering” to “Context Engineering.”

What’s the difference?

  • Old mindset (Prompt Engineering): “How should I phrase this instruction?”
  • New mindset (Context Engineering): “What Context configuration will most likely get the model to produce what I want?”

Here’s a cooking analogy:

  • The old approach: “Let me teach you step-by-step how to make this dish.”
  • The new approach: “Here are all the ingredients and my taste preferences—figure out the best way to cook it.”

This shift matters. We used to focus on how to ask. Now it’s more about how to inform.


What Makes Good Context?

Anthropic offers a precise definition: Find the smallest but most relevant set of information to maximize the desired outcome.

In plain English: Give information that’s precise, relevant, and free of fluff.

More Context isn’t always better. Stuff it with irrelevant information, and the model gets distracted and loses focus. It’s like handing your employee a briefing packed with unrelated company history, last year’s project notes, and office gossip—they won’t know what actually matters.

Good Context should be:

  • Highly relevant to the current task
  • Free of noise
  • Complete with the key information needed to do the job
  • Clearly structured so the model can parse it easily

Real Example: Same Question, Different Context

Let’s look at an example:

No Context:

“Write me an email.”
→ AI gives you a generic, boilerplate email. Nothing specific.

Basic Context:

“Write me an email to a client we’ve worked with for three years. They just got a new manager. Keep it formal but warm.”
→ Completely different result. At least it’s targeted.

Full Context:

On top of the above, you also provide:

  • Basic info about the client
  • Past email exchanges with them
  • The purpose and background of this email
  • Your company’s history with theirs

→ The output quality jumps another level.

The difference? The quality of Context.

If you don’t want to go that far, at least remember this simple formula:

Who’s the audience + What’s the purpose + What tone to strike

Just clarify these three things, and your results will be way better than a bare “write me an email.”


From “Teaching AI How to Do Things” to “Giving AI Enough Information”

Earlier, I mentioned the shift from Prompt Engineering to Context Engineering. Another way to look at it: we’re moving from “teaching AI how to do things” to “giving AI enough information to figure it out.”

Back when language models weren’t as capable, our prompts were mostly instructions—telling AI what steps to follow. AI was like a newbie who needed hand-holding.

Now, with 2025-level models, things are different. They’re smart enough to know how to do things. Our job is to provide enough relevant information so they can produce great output.

Anthropic observed something interesting internally: in just one year, the percentage of engineers using AI jumped from 28% to 59%, and self-reported productivity gains increased significantly. What changed their work wasn’t the model getting smarter—it was people learning how to feed it the right Context.


Conclusion

Understanding Context is the first step to working effectively with AI.

Once you realize that Context is all AI can see, you start asking different questions: How do I put the right information in? How do I make sure it sees what it needs to see? How do I avoid stuffing it with noise?

Next time AI seems to get dumber, try this mindset:

Think about what information to give before thinking about what instruction to give.

Instead of jumping straight to “what command should I type,” ask yourself: “If this were a new hire helping me, what background, data, and constraints would I tell them?” Write that down—and you’re doing Context Engineering.

In future posts, we’ll dive deeper into how to control Context effectively. This discipline is called Context Engineering.