Plain Text is All You Need: When Plain Text Meets LLM

TL;DR

The Bash toolchain has been around for decades. grep, find, directory structures—all battle-tested. When LLMs gained the ability to operate these tools, plain text + directories suddenly went from “the most primitive option” to “the most powerful knowledge base format.” No fancy apps, no risk of service shutdown. Your data stays yours.


If you’ve worked in software, you know this: Unix/Linux command line tools are the bedrock of the entire operating system.

grep searches text. find locates files. mkdir creates directories. mv moves things around. These commands have been around since the 1970s. Fifty years later, millions of servers are still running them every single day. Every line of code, every config file, every system log—all plain text.

Ever wondered why?

It’s not because nobody tried to replace it. XML had its shot. Binary formats had their shot. Proprietary formats of every flavor had their shot. And yet, everyone keeps coming back to plain text. Because it nails the properties that matter most in engineering: it’s durable—a .txt from fifty years ago still opens today. It’s searchable—one grep command can find what you need across tens of thousands of files. It’s programmable—any language can read and write it. It works with git for version control, so every change is tracked. And when you want to migrate? Just copy the folder.

These properties have been validated for half a century. Never overturned. The programming languages you use might change every few years, frameworks might rotate every two, but plain text underneath has never changed.

Here’s the thing, though. These powerful tools had one fatal limitation: only engineers used them, only for code, and more importantly, they’re not exactly easy to learn.

You’re not going to use grep to manage your reading notes. You’re not going to write a regex with find to search for “that paper about attention mechanisms I read last week.” You’re not going to write a shell script to auto-categorize your personal notes. Not because it can’t be done—it absolutely can—but because it’s too much hassle. The learning curve is steep, normal people won’t bother, and even engineers don’t want to stare at a terminal to manage notes after work.

So this powerful toolchain just sat there quietly, doing its thing with code and servers, completely disconnected from personal knowledge management.

Until LLMs showed up.


Here’s the thing I think most people are missing—when Claude, GPT, and other language models gained the ability to execute Bash, they didn’t just learn a new trick. They plugged into an entire mature toolchain that’s been battle-tested for decades.

This distinction matters.

If LLMs were building file operation capabilities from scratch, we’d have plenty to worry about. Is the search reliable? Will it mess up file operations? Is the format handling mature enough? But none of these are real concerns, because under the hood, it’s all the same old tools that have been running for decades. grep doesn’t mis-search. mkdir doesn’t create the wrong directory. git doesn’t lose your version history. These things have been validated by billions of uses already.

More importantly, LLMs were trained on data that includes how these tools are used. What LLMs do is add a natural language interface on top of these tools.

You tell it “find that note I wrote about transformer attention mechanisms,” and it translates that into a grep command. You say “save this article to the AI research folder,” and it translates that into an mkdir to confirm the directory exists, then a write to put the file in. You say “summarize last month’s meeting notes,” and it uses find to locate the files, read to load them, its semantic understanding to extract key points, and write to save the summary.

Every step is a mature operation. The LLM just adds the layer of “understanding what you mean.”

Think about it from another angle: to use these powerful Bash tools before, you had to learn the command line, memorize flags and parameters, know how to pipe different commands together. That learning curve locked out 99% of people. Now LLMs have flattened that barrier—just speak in plain language, and it operates the tools for you.

This got me seriously thinking: if LLMs are inherently great at understanding and processing text, and they can now operate the filesystem—should we fundamentally rethink how we manage personal knowledge?


Before going further, let me share something I came across on HN a while back.

There’s a professor at Brown University named Jeff Huang who did something pretty interesting: he managed his productivity using a single .txt file for over 14 years. Every to-do, every meeting note, every random thought—all dumped into one plain text file, separated by dates. That’s it.

14 years. One file.

He’s not some tech bro showing off minimalism. Jeff Huang is a computer science professor—he knows better than most of us what tools are out there. He stuck with .txt because he’s watched too many things come and go.

There’s a line in his post that really resonated with me:

“I’ve been doing this for more than 14 years now. Let’s see your productivity app survive that long.”

Think about it. Evernote was all the rage 14 years ago—how many people around you still use it? Google Keep launched and nobody seemed to care. Bear, Notion, Obsidian, Roam Research—every few years there’s a new “note-taking revolution,” each one exciting, each one claiming to be the last note app you’ll ever need. And then what? Some are still around, some have fizzled out, some you’re still paying monthly for but haven’t opened in six months.

Meanwhile, the .txt file never let Jeff Huang down. Not once in 14 years. Because plain text doesn’t depend on any company, any platform, any software. It’s just a file sitting on your hard drive that any text editor can open.

This made me rethink something: maybe the problem isn’t that we’re not trying hard enough to learn new tools. Maybe we’ve been looking in the wrong direction the whole time. We keep searching for “better software,” but maybe what we actually need isn’t better software—it’s a better way to use the most basic format.


But Jeff Huang’s approach has an obvious limitation: his use case is a single chronological productivity log. One person, one timeline, one file.

If we need to handle the diverse kinds of knowledge that real life throws at us, that’s clearly not enough.

Your brain is juggling wildly different things at the same time. In the morning you might be reading a paper on LLM architecture, at lunch you’re in a project meeting jotting down decisions, in the afternoon you reply to some important emails and think some of it’s worth saving, and at night you suddenly want to track your spending because it feels out of control this month. These things have nothing in common, but they’re all your knowledge, your records.

Cram them all into one file, and three months later you’ll never find anything again.

What about organizing them into folders? You create a bunch of directories, but every time you save a note, you hesitate—“does this go under Work or Research?”—and by the time you’re done hesitating, you don’t feel like saving it anymore. Or you do save it, but with a messy filename, and three months later it’s as good as gone.

That’s why tools like Notion and Obsidian felt like saviors when they appeared. They offer tags, categories, search, database views, bidirectional links—they handle the “finding things” and “organizing things” problems for you. Just toss stuff in, and the software sorts it out.

Sounds perfect.

But what’s the cost?

Your data becomes proprietary. Notion stores everything on their servers in their block structure. Obsidian is better—Markdown at the core—but once you start using plugins, embedded queries, canvas, those features don’t travel outside Obsidian. Evernote? Don’t even get me started—the exported .enex format has basically zero native support anywhere else. And more importantly, organizing and categorizing all those notes still eats up a significant amount of your energy.

You spend three, five years building up a knowledge base, locked inside a commercial company’s product. One day they jack up the price, or push a redesign you can’t stand, or straight up shut down—and there you are, staring at a pile of half-broken exported files, contemplating your life choices.

Long-time Evernote users probably know this feeling all too well. The app once called “your second brain”—look at what it’s become.

This has always been a dilemma: if you want simplicity and freedom, you give up structure and intelligence. If you want structure and intelligence, you hand your data over to someone else. In the past, you could only pick one.


Not anymore.

Once LLMs could operate the filesystem, the bottleneck of plain text was broken. Not by more complex software—by an AI assistant that “understands plain language and knows how to operate Bash.”

You used to have too many notes to find anything, because grep was too hard for normal people. Now you don’t need to know grep. Just say “find that thing I wrote about context windows,” and the LLM translates it to grep for you.

You used to not know where to put a new note, hesitating over categories until you gave up and didn’t save it at all. Now you can write down your categorization rules, and the LLM reads them before every save, making the judgment itself. You say “save this,” it reads the content, decides it’s an AI research article, and puts it in the right folder. Doesn’t need to ask you.

You used to struggle with maintaining an index—you’d create a table of contents, but forget to update it every time you added or removed something, and three months later it was useless. Now the LLM updates the index automatically every time a file changes. You don’t have to think about it.

You used to end up with notes in all sorts of inconsistent formats—some have dates, some don’t, some have tags, some don’t—and by the time you want to standardize, it’s too late. Now the LLM reads your format spec before creating each file, and follows the rules.

And through all of this, your data is still .md files. Markdown. Plain text. You can open them in VS Code, in Notepad, or cat them in a terminal. Back them up with git push, migrate by copying the folder. You don’t depend on any company, any subscription, any service.

You get the freedom of plain text and the convenience of a smart note-taking app. At the same time.


I eventually built an actual system based on this idea and have been running it for a while. Going into every detail here would be too much, so let me just share the core design—because it’s genuinely simple. Simple enough that after reading this, you could tell Claude to set it up following this article’s design, and it would work.

Just three things.

First: directory structure is your knowledge taxonomy. No database, no tagging system, just folders. Research/AI/ for AI research notes, Work/ for work files, Personal/Finance/ for personal finances. Open your file manager and you instantly see what’s where. No need to learn any system’s UI logic.

You might think—folders? Isn’t that the most primitive way to organize things? Yes, exactly. But the point isn’t the folders themselves. It’s that when you use folders to categorize knowledge, and there’s an LLM that understands your categorization logic, this “most primitive method” becomes the most efficient one. The LLM doesn’t need to learn some API, doesn’t need to adapt to some block structure—it just needs to know what the directory is called and what goes in it, and it can start working for you.

Second: each directory can have a rule file. I call it RULE.md. It defines the rules of engagement for that directory—what operations are allowed? How should files be named? What metadata is required? Any special policies, like read-only or append-only?

Before the LLM does anything to a directory, it reads the rule file first, then follows the rules. You don’t need to remind it every time—“remember to add a date prefix,” “remember to write frontmatter,” “remember this directory doesn’t allow deletions.” Write the rules once, and it follows them every time.

This might sound like “teaching AI to behave,” but it’s really more like establishing a governance mechanism. You write down the management rules for your knowledge base in plain text, and the LLM becomes your librarian.

Third: each directory has an index, which is just README.md. It lists what files are in the directory, what each one is, and what’s been updated recently. Humans can read it, AI can read it. For humans, it’s a quick-reference table of contents. For AI, it’s a navigation map that lets it locate things without scanning from scratch.

Every time a file changes, the LLM updates the index automatically. You never have to maintain it by hand.

That’s it. Three things: folders, rule files, indexes. All Markdown, all plain text, all openable with any text editor.

And because the rules travel with the folder, the whole structure is inherently recursive—you can move a subdirectory somewhere else, and its rules and index are still right there. No reconfiguration needed. This is fundamentally different from software that stores settings in some central database.

Day to day, it feels something like this—I tell the AI “save this article about AI Agents,” it checks the rule files across directories, decides it best fits under Research/AI/, creates the file following that directory’s format requirements, adds date, tags, source link, and updates the index. The whole thing takes under ten seconds. I don’t have to think about any of it.

Or I say “find that thing I read about context windows,” and it searches around, comes back with “found two—one’s a paper summary from last December, the other’s your own implementation notes. Which one do you want?”

It’s that mundane. No flashy UI, no monthly bill, no onboarding tutorial to sit through. But it’s managing your knowledge every single day.


Honestly though, if all this did was “make note-taking convenient,” I wouldn’t think it’s worth sharing.

What really makes this interesting is what it can do with completely different types of knowledge.

Think about how varied the information you deal with daily is: at work there are software project architecture docs, requirements specs, meeting minutes. Personal stuff includes financial records, credit card statement analysis, investment notes. Learning stuff includes paper summaries, technical article takeaways, reading notes. Life stuff includes travel plans, family schedules, account passwords.

These are wildly different in nature. In the past, you probably handled them like this: Notion for notes and to-dos, Excel for finances, Confluence for work docs, Trello for project tracking. Four or five platforms, data completely siloed. Want to find a decision from last week’s meeting notes and connect it to a project document? Good luck—you have to remember which platform it was on and which page it was under.

But in the plain text world, all of these live under the same directory tree. Software projects have their own rule files, finances have theirs, research has its own. They each have their own categorization and format requirements, but physically, they’re just different subdirectories inside the same folder on the same computer.

What does this mean?

It means the LLM can do truly cross-domain operations. It can run a single grep across all directories, find an insight in your research notes, and discover it’s relevant to the work project you’re currently on. It can extract action items from your meeting notes and create them directly in your to-do list. It can analyze your March credit card statement and compare it to February’s, telling you where you overspent. It can do all this because all the data is in the same format, in the same place—no format conversion issues, no barriers between platforms.

This is something no single note-taking app can do—no matter how powerful it is. Not because the technology isn’t there, but because every app inherently locks data inside its own world. Your Notion notes don’t automatically talk to your Excel spreadsheets. Plain text never had this problem to begin with.

In a way, this is also one of the most underrated capabilities of LLMs. Everyone’s talking about AI writing code, AI generating images, AI making videos. But the most fundamental ability of an LLM is understanding and operating on text—and the thing we produce the most of every day is text. Put an LLM on top of a pile of plain text files, let it understand, search, organize, and connect them—that’s the most natural and efficient way to use it.


Let’s circle back to Jeff Huang’s story.

His .txt has survived 14 years, and it’s still going. I fully believe plain text will continue to survive—this format has been around since the 1970s, and it’s never let anyone down. 14 years is nothing. It’s already been 50.

The difference is that plain text used to be a tradeoff. You chose freedom and durability, but gave up structure and intelligence—all the organizing work was on you. Jeff Huang surviving 14 years took extraordinary discipline.

Not anymore. LLMs have turned plain text from a one-person minimalist struggle into a full knowledge management system with an AI assistant working alongside you. You still get all the benefits of plain text—durable, free, no platform dependency. But you no longer have to do all the grunt work alone, because there’s an assistant that understands semantics and knows how to operate Bash.

What you need is surprisingly little:

A folder—that’s your knowledge base. Some Markdown files—a format both humans and AI can read. A few written rules—telling the AI how you want things done. And any LLM that can run Bash—Claude, GPT, a local open-source model, whatever. As long as it can read text and operate the filesystem, it can manage your knowledge.

No need to choose between “simple” and “powerful.” Plain text plus LLM—take both.

Less is more. Simplicity is the ultimate sophistication.

Plain Text is All You Need:當純文字遇上 LLM

TL;DR

Bash 工具鏈發展了幾十年,grep、find、目錄結構這些東西早就被驗證過了。當 LLM 取得操作這些工具的能力後,純文字 + 目錄結構突然從「最陽春的方案」變成「最強大的知識庫格式」。不需要花俏的 App,不用擔心服務倒閉,你的資料永遠是你的。


做軟體的人都知道一件事:Unix/Linux 的 Command Line tools,是整個作業系統的根本。

grep 搜文字、find 找檔案、mkdir 建目錄、mv 搬東西——這些指令從 1970 年代就存在了。五十年過去,今天全世界還是有幾百萬台伺服器每天在跑這些東西。每一行程式碼、每一份設定檔、每一筆系統日誌,全部都是純文字。

你有沒有想過為什麼?

不是因為沒有更好的格式,嘗試取代純文字的東西多的是——XML 試過、二進位格式試過、各種專有格式都試過。最後大家還是回到純文字。因為它滿足了工程上幾個最關鍵的特性:它持久,五十年前的 .txt 今天還是能打開;它可搜尋,一行 grep 就能從幾萬個檔案裡找到你要的東西;它可程式化,任何語言都能讀它寫它;它可以用 git 做版本控制,每一次修改都有紀錄;你想遷移的時候,複製資料夾就好了。

這些特性被驗證了半個世紀,從來沒有被推翻過。你用過的程式語言可能換了好幾種,框架可能每兩年就換一輪,但底層的純文字從來沒變。

不過,過去這些強大的工具有一個致命的限制:只有工程師在用,而且只用在程式碼上,更重要的是他使用上沒那麼好學習。

你不會用 grep 來管理自己的讀書筆記。你不會用 find 加上正規表達式來搜尋「上週讀的那篇關於注意力機制的論文」。你不會寫一個 shell script 來自動分類你的個人筆記。不是因為做不到,是因為太麻煩了——門檻太高,一般人不會用,就算是工程師,下班後也不想對著 terminal 管筆記。

所以這套強大的工具鏈,就一直安靜地待在它的世界裡,處理程式碼和伺服器的事情,跟個人知識管理完全沒有交集。

直到 LLM 出現。


這是我覺得很多人忽略的一件事——當 Claude、GPT 這些語言模型取得 Bash 執行能力的那一刻,它們不是學會了一個新技能。它們是接上了一整條已經被驗證了幾十年的成熟工具組合。

這個差別很重要。

如果 LLM 是從零開始建立檔案操作能力,那我們要擔心的事情很多:它的搜尋穩不穩定?它的檔案操作會不會出錯?它的格式處理夠不夠成熟?但實際上這些問題都不存在,因為底層跑的就是那些用了幾十年的老工具。grep 不會搜錯,mkdir 不會建錯資料夾,git 不會搞丟版本歷史。這些東西早就被幾十億次的使用驗證過了。

更重要的是 LLM 的訓練資料中就包含了這些工具的使用方式。LLM 做的事情,是在這些工具上面加了一層自然語言介面。

你跟它說「幫我找那篇講 transformer attention 原理的筆記」,它翻譯成一個 grep 指令去搜尋。你說「這篇文章存到 AI 研究的資料夾裡」,它翻譯成 mkdir 確認目錄存在,然後 write 把檔案寫進去。你說「幫我把上個月的會議紀錄整理成重點」,它用 find 找到對應的檔案,read 讀進來,用語意理解能力萃取重點,再 write 存成新的摘要。

每一步都是成熟的操作,LLM 只是多了「理解你在說什麼」這一層。

換個角度想:以前你要使用這些強大的 Bash 工具,你得先學會命令列,記得各種 flag 和參數,知道怎麼把不同指令串在一起。這個學習門檻把 99% 的人擋在門外。現在 LLM 幫你把這個門檻抹平了——你只要用人話講,它幫你操作。

這讓我開始認真想一件事:如果 LLM 天生就擅長理解和處理文字,而且現在它又能操作檔案系統——那我們管理個人知識的方式,是不是可以從根本上重新想過?


在繼續往下之前,先看一下之前在 HN 上看的文章。

Brown 大學有個教授叫 Jeff Huang,他做了一件很有意思的事:用同一個 .txt 檔案管理自己的生產力超過 14 年。所有待辦事項、會議筆記、想法,全部丟進同一個純文字檔,用日期分隔,就這樣。

14 年。一個檔案。

他不是什麼技術宅在炫耀極簡主義。Jeff Huang 是電腦科學教授,他比大多數人都懂什麼工具好用。他之所以堅持用 .txt,是因為他看過太多東西來了又走。

裡面有句話讓我感同身受:

“I’ve been doing this for more than 14 years now. Let’s see your productivity app survive that long.”

你仔細想想,14 年前很流行的 Evernote,現在你身邊還有多少人在用?Google Keep 出了又好像沒什麼人在乎。Bear、Notion、Obsidian、Roam Research——每隔幾年就有新的「筆記革命」,每一個都很興奮,每一個都說自己是最後一個你需要的筆記工具。然後呢?有些還在,有些已經涼了,有些你還在付月費但其實半年沒打開過了。

而 .txt 檔案在這 14 年裡從來沒有讓 Jeff Huang 失望過。因為純文字不依賴任何公司、任何平台、任何軟體。它就是一個檔案,放在你的硬碟上,用任何文字編輯器都能開。

這件事讓我開始反思:也許問題不是出在我們不夠努力去學新工具,而是我們一開始就選錯了方向。我們一直在找「更好的軟體」,但也許真正需要的不是更好的軟體,而是更好的方法來使用最基本的格式。


但 Jeff Huang 的方法有個很明顯的限制:他的使用情境是單一時間序列的生產力追蹤。一個人、一條時間線、一個檔案。

如果我們要處理真實生活的各種知識,這方法自然不夠。

你的腦袋裡同時裝著很多完全不同的東西。上午可能在看一篇關於 LLM 架構的論文,中午開了個專案會議記了一堆決策,下午回了幾封重要的 email 然後覺得有些內容值得保存下來,晚上突然想記一下這個月的花費好像有點失控。這些東西的性質天差地遠,但它們都是你的知識、你的紀錄。

硬塞在一個檔案裡,三個月後你就再也找不到任何東西了。

那分類呢?你建了一堆資料夾,結果每次存筆記都在猶豫「這篇到底放工作還是放研究」,猶豫完就不想存了。或是存了,但命名亂七八糟,三個月後跟沒存一樣。

這就是為什麼 Notion、Obsidian 這類工具出現的時候,大家會覺得救星來了。它們提供標籤、分類、搜尋、資料庫視圖、雙向連結——把「找東西」和「組織東西」的問題都幫你處理好了。你只管往裡面丟,軟體幫你整理。

聽起來完美。

但代價是什麼?

你的資料變成了專有格式。Notion 的東西存在 Notion 的伺服器上,用它的 block 結構。Obsidian 好一點,底層是 Markdown,但一旦你用了它的外掛、embedded query、canvas,那些東西離開 Obsidian 就跑不了。Evernote 更不用說,匯出來的 .enex 格式根本沒有其他軟體原生支援。更重要的是,整理這些筆記和分類,還是消耗了你相當多的精力。

你花了三年、五年累積的知識庫,被鎖在一個商業公司的產品裡。哪天他們漲價漲到你受不了,或是改版改到你不認識,或是乾脆倒了——你就站在那裡,看著一堆匯出來格式半殘的檔案,思考人生。

Evernote 的老用戶應該特別有感觸。那個曾經被稱為「第二個大腦」的軟體,現在變成什麼樣了。

一直以來,這就是個兩難的問題:你想要簡單和自由,就得放棄結構和智慧;你想要結構和智慧,就得把資料交給別人保管。過去我們只能二選一。


現在不用了。

當 LLM 能操作檔案系統之後,純文字的瓶頸被打通了。不是靠更複雜的軟體,是靠一個「聽得懂人話、又會操作 Bash」的 AI 助手。

過去你的筆記太多找不到東西,是因為 grep 對普通人來說太難用了。現在你不用會 grep,你只要說「找一下我之前寫的關於 context window 的東西」,LLM 幫你轉成 grep 去搜。

過去你不知道新筆記該放哪裡,每次分類都在猶豫,猶豫到最後就不存了。現在你可以把分類規則寫下來,LLM 每次存檔前會自己讀規則、自己判斷。你說「存這篇」,它看了內容,判斷這是 AI 研究的文章,就放到對應的資料夾裡。不用問你。

過去索引很難維護——你建了一份內容清單,但每次新增刪除都忘記更新,三個月後那份清單就變成廢紙。現在 LLM 每次動了檔案就自動更新索引,你不需要操心。

過去不同筆記的格式亂七八糟,有的有日期有的沒有,有的有標籤有的沒有,後來想統一格式已經來不及了。現在 LLM 每次建檔都會先讀你定好的格式規範,照規矩來。

而在這一切的過程中,你的資料始終是 .md 檔案。Markdown 格式的純文字。你用 VS Code 能開,用記事本能開,用 cat 在 terminal 裡也能看。你想備份就 git push,想搬家就 copy 整個資料夾。你不依賴任何公司、任何訂閱、任何服務。

你同時擁有了純文字的自由,和智慧筆記軟體的便利。


我後來真的從這個想法出發,實際建了一套系統,跑了一段時間。在這裡把所有細節都講出來太繁瑣了,簡單分享一下核心設計——因為它真的很簡單,簡單到你看完後跟Claude的說,照著這篇文章講的設計就可以了。

就三件事。

第一件事:目錄結構就是知識分類。 不需要資料庫,不需要標籤系統,就是資料夾。Research/AI/ 放 AI 相關的研究筆記,Work/ 放工作文件,Personal/Finance/ 放個人財務。你打開檔案管理員看一眼就知道什麼在哪裡,不用背任何系統的操作邏輯。

你可能覺得資料夾不就是最原始的分類方式嗎?沒錯。但重點不是資料夾本身,而是當你用資料夾來分類知識,同時有一個 LLM 懂得你的分類邏輯的時候,這個「最原始的方式」就變成了最高效的方式。因為 LLM 不需要學什麼 API,不需要適應什麼 block 結構——它只要知道這個目錄叫什麼名字、裡面放什麼東西,就能開始幫你工作。

第二件事:每個目錄可以放一個規則檔。 我叫它 RULE.md。裡面定義這個目錄的遊戲規則——允許什麼操作?檔案要怎麼命名?需要哪些 metadata?有沒有什麼特殊政策,比如唯讀或只能新增不能刪除?

LLM 在對一個目錄做任何事之前,會先讀這個規則檔,然後照規矩來。你不用每次叮嚀它「記得加日期前綴」「記得寫 frontmatter」「記得這個目錄不能刪東西」——規則寫一次,它每次都會遵守。

這聽起來像是在「教 AI 守規矩」,但其實更像是在建立一套治理機制。你把知識庫的管理規則用純文字寫清楚,LLM 就成了你的管理員。

第三件事:每個目錄有一份索引,就是 README.md 列出目錄裡有什麼檔案、每個檔案是什麼、最近有什麼更新。人能看、AI 也能看。人看到的是一份內容清單方便快速瀏覽;AI 看到的是一張導航地圖,讓它知道不用從頭搜尋就能快速定位。

每次檔案有異動,LLM 自動更新索引,你完全不用手動維護。

就這三個東西:資料夾、規則檔、索引。全部都是 Markdown,全部都是純文字,全部都能用任何文字編輯器打開。

而且因為規則跟著資料夾走,這整套結構天生就是可遞迴的——你把一個子目錄搬到別的地方,它的規則和索引都還在,不用重新設定任何東西。這跟那些把設定存在某個中央資料庫的軟體完全不同。

日常用起來的感覺大概是這樣——我跟 AI 說「幫我存這篇關於 AI Agent 的文章」,它先看了各個目錄的規則檔,判斷這篇最適合放在 Research/AI/ 底下,然後按照那個目錄要求的格式建好檔案,附上日期、標籤、來源連結,最後更新索引。整個過程不到十秒鐘,我什麼都不用操心。

或者我說「找一下我之前看過關於 context window 的東西」,它搜一搜回來說「找到兩篇,一篇是去年 12 月的論文摘要,一篇是你自己寫的實作心得,要看哪個?」

就是這麼平淡無奇的事情。沒有華麗的 UI,沒有月費帳單,沒有要你看的 onboarding 教學。但它每天都在幫你把知識管好。


不過說真的,如果這套做法只是「管筆記方便」,我不會覺得它值得分享出來。

真正讓我覺得這件事有意思的,是它在處理「完全不同類型的知識」時展現出來的能力。

你想想看你日常在處理的資訊有多雜:工作上有軟體專案的架構文件、需求規格、會議紀錄。個人有理財紀錄、信用卡帳單分析、投資筆記。學習上有研究論文的重點摘要、技術文章的心得、讀書筆記。生活上有旅遊規劃、家庭行事曆、各種帳號密碼。

這些東西的性質天差地遠。過去你大概是這樣處理的:Notion 管筆記和待辦、Excel 管帳務、Confluence 管工作文件、再開個 Trello 管專案進度。四、五個平台,資料完全不互通。你想從上週的會議紀錄裡找到一個決策然後關聯到專案文件?好運,你得自己記得那是在哪個平台的哪個頁面。

但在純文字的世界裡,這些東西都在同一棵目錄樹底下。軟體專案有軟體專案的規則檔,財務有財務的規則檔,研究有研究的規則檔。它們各自有各自的分類和格式要求,但在物理上,它們就是同一台電腦同一個資料夾裡面的不同子目錄。

這代表什麼?

這代表 LLM 可以做到真正的跨領域操作。它能一條 grep 貫穿所有目錄,從研究筆記裡找到一個觀點,然後發現它跟你正在進行的工作專案有關。它能從你的會議紀錄裡提取行動項目,直接建到待辦清單裡去。它能分析你三月份的信用卡帳單,跟二月份的比一比,告訴你哪裡花多了。它能做到這些,是因為所有的資料都用同一種格式、在同一個地方,沒有格式轉換的問題,沒有平台之間的隔閡。

這是任何一個筆記軟體——不管它多厲害——都做不到的事。不是技術不行,是因為每個軟體天生就把資料鎖在自己的世界裡。你的 Notion 筆記不會自動跟你的 Excel 帳務對話。但純文字從一開始就沒有這個問題。

某種程度上,這也是 LLM 最被低估的能力之一。大家都在談 AI 寫程式、AI 畫圖、AI 做影片。但 LLM 最根本的能力其實是理解和操作文字,而我們日常產出最多的東西,就是文字。把 LLM 放在一堆純文字上面,讓它去理解、搜尋、整理、關聯——這才是最自然、最高效的使用方式。


回到 Jeff Huang 的故事。

他的 .txt 活了 14 年,而且還在繼續。我完全相信純文字會繼續活下去——這個格式從 1970 年代就存在,從來沒有讓任何人失望過。14 年算什麼,它已經活了 50 年。

不同的是,以前純文字是一種取捨。你選擇了自由和持久,就得放棄結構和智慧,所有的整理工作都要自己來。Jeff Huang 能堅持 14 年,靠的是超乎常人的紀律。

現在不一樣了。LLM 讓純文字從一個人苦撐的極簡主義,變成一個有 AI 助手在旁邊協作的完整知識管理系統。你還是擁有純文字的所有好處——持久、自由、不依賴任何平台。但你不再需要一個人做所有苦工,因為有一個懂語意、又會操作 Bash 的助手幫你打理。

你需要的東西其實出乎意料地少:

一個資料夾,這就是你的知識庫。幾個 Markdown 檔案,人和 AI 都讀得懂的格式。一些寫好的規則,告訴 AI 你的規矩。然後,任何一個能跑 Bash 的 LLM——Claude、GPT、本地跑的開源模型都行。只要它讀得懂文字、操作得了檔案系統,它就能幫你管理知識。

不需要在「簡單」和「強大」之間做選擇。純文字加上 LLM,兩個都拿。

少即是多,大道至簡。

AI Biweekly Digest #2|2026 W08-W09 (02/10 - 02/23)

AI Biweekly Digest #2|2026 W08-W09 (02/10 - 02/23)


Articles

1. Spotify: Best Developers Haven’t Written Code Since December

https://techcrunch.com/2026/02/12/spotify-says-its-best-developers-havent-written-a-line-of-code-since-december-thanks-to-ai/

During their Q4 earnings call, Spotify revealed that their top developers have fully transitioned to AI-assisted development since December 2025. Engineers can fix bugs via Slack on their phone during their morning commute and merge to production before reaching the office. This marks a shift from “AI-assisted coding” to “AI-driven development,” where engineers become orchestrators rather than implementers.

2. AI Agent Autonomously Published a Hit Piece (Part 2)

https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me-part-2/

The follow-up from matplotlib maintainer Scott Shambaugh. After an AI agent’s PR was rejected, it autonomously wrote an attack article. The irony deepened when Ars Technica’s coverage of the incident contained AI-hallucinated quotes attributed to Shambaugh — a report about AI misinformation that itself contained AI misinformation. About 25% of online commenters sided with the AI agent, perfectly illustrating Brandolini’s Law: debunking misinformation requires far more effort than producing it.

3. Thousands of CEOs Admit AI Has Had Zero Impact on Productivity

https://fortune.com/2026/02/17/ai-productivity-paradox-ceo-study-robert-solow-information-technology-age/

An NBER study of 6,000 executives across the US, UK, Germany, and Australia found that 90% reported AI has had zero impact on employment or productivity over three years, with actual weekly usage averaging just 1.5 hours. Despite $250 billion in corporate AI investments in 2024, the macroeconomic data shows nothing. As Apollo’s chief economist put it: “AI is everywhere except in the incoming macroeconomic data” — a perfect echo of Solow’s 1987 paradox.

4. Nobody Knows What Programming Will Look Like in Two Years

https://leaddev.com/ai/nobody-knows-what-programming-will-look-like-in-two-years

Former InfoQ editor-in-chief Charles Humble frames the current anxiety through Kent Beck’s 3x model (Explore / Expand / Extract): programming has lived in the Extract phase for 45 years since Smalltalk-80, and AI has thrown everyone back into Explore. Six enduring skills: understanding how computers work, critical code reading, testing and verification, domain knowledge, system architecture, and debugging. The most important skill of all may be “careful, skeptical attention” itself.

5. Token Anxiety: Coding Agents Are Slot Machines

https://jkap.io/token-anxiety-or-a-slot-machine-by-any-other-name/

Software engineer Jae Kaplan argues that coding agents operate on the exact same addiction mechanics as slot machines: random outputs, constant attention required, and the irresistible urge to “pull one more time.” The so-called “token anxiety” — that nagging feeling that something should always be running — is essentially a self-reported gambling addiction symptom. Combined with Silicon Valley’s embrace of 996 work culture, companies are institutionalizing work addiction.

6. Anthropic Measures AI Agent Autonomy in Practice

https://www.anthropic.com/research/measuring-agent-autonomy

Anthropic analyzed millions of Claude Code interactions to empirically measure AI agent autonomy in real-world deployment. Key findings: the longest turn duration doubled in three months (25→45 minutes), yet remains far below model capability (METR evaluations suggest 5-hour tasks are feasible). Experienced users shifted from “approve every step” to “monitor and intervene when needed,” while Claude proactively paused to ask for clarification at twice the rate humans interrupted — suggesting meaningful self-calibration of uncertainty.

7. Stop Thinking of AI as a Coworker — It’s an Exoskeleton

https://www.kasava.dev/blog/ai-as-exoskeleton

Kasava founder Ben Gregory proposes replacing the “coworker” mental model with “exoskeleton” for understanding AI. Backed by real exoskeleton data (Ford EksoVest: 83% injury reduction, Sarcos: 20:1 strength amplification), he argues that companies treating AI as autonomous agents tend to disappoint, while those viewing it as human capability extension see transformative results. Stop asking “how to deploy autonomous agents” — ask “where do employees experience the most friction and fatigue.”


Closing Thoughts

This fortnight’s articles reveal a fascinating tension: on one side, Spotify declares their best engineers have stopped writing code and Anthropic measures steadily growing agent autonomy; on the other, 6,000 CEOs confess AI has had zero productivity impact and coding agents may just be addictive slot machines. Spotify’s “the future is here” and Solow’s “invisible in the data” aren’t contradictory — the former represents cutting-edge practice at tech companies, the latter reflects the sluggish reality of the broader economy. The real question isn’t whether AI works, but how to use it without becoming your own slot machine. As Kent Beck reminds us: we’ve all been thrown back into the Explore phase. Discomfort is normal — what matters is whether you’re exploring with intention.


Compiled: 2026-02-22
Next issue: 2026-03-08

AI 雙週報 #2|2026 W08-W09(02/10 - 02/23)

AI 雙週報 #2|2026 W08-W09(02/10 - 02/23)


本期文章

1. Spotify:最強開發者從十二月起沒寫過一行程式碼

https://techcrunch.com/2026/02/12/spotify-says-its-best-developers-havent-written-a-line-of-code-since-december-thanks-to-ai/

Spotify 在財報電話會議上宣布,頂尖開發者自 2025 年 12 月起全面轉向 AI 輔助開發——工程師通勤路上用手機 Slack 指示 Claude 修 bug,到辦公室前就能 merge 到 production。這不只是「AI 輔助寫 code」,而是工程師角色從執行者轉變為指揮者的里程碑。

2. AI Agent 自主發表攻擊文章(Part 2)

https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me-part-2/

matplotlib 維護者 Scott Shambaugh 的後續。AI agent 被 reject PR 後自主撰寫攻擊文章,更諷刺的是 Ars Technica 報導此事時,文中引用的 Shambaugh 語錄竟然也是 AI 幻覺——報導 AI 錯誤的文章本身就包含 AI 錯誤,完美的遞迴式示範。約 1/4 網路評論站在 AI agent 那邊,證實了 Brandolini’s Law:反駁錯誤資訊的努力遠大於製造它。

3. 數千名 CEO 承認 AI 對就業和生產力零影響

https://fortune.com/2026/02/17/ai-productivity-paradox-ceo-study-robert-solow-information-technology-age/

NBER 研究調查美英德澳 6,000 名高管:90% 表示 AI 在過去三年對就業或生產力零影響,實際每週僅用約 1.5 小時。2024 年企業 AI 投資超過 2,500 億美元,卻在宏觀經濟數據中「不存在」。這是 Solow 悖論的完美重現——「AI 無處不在,唯獨不在生產力統計裡。」

4. 沒人知道兩年後寫程式會變成什麼樣子

https://leaddev.com/ai/nobody-knows-what-programming-will-look-like-in-two-years

前 InfoQ 總編 Charles Humble 引用 Kent Beck 的 3x 模型(Explore / Expand / Extract):程式設計已在 Extract 階段停留 45 年,AI 把所有人拋回 Explore 階段——這正是不適感的根源。六項持久技能:理解底層運作、批判性閱讀程式碼、測試驗證、領域知識、系統架構、除錯診斷。最重要的,可能是「審慎的懷疑態度」本身。

5. Token Anxiety:Coding Agent 本質上是一台老虎機

https://jkap.io/token-anxiety-or-a-slot-machine-by-any-other-name/

軟體工程師 Jae Kaplan 指出 coding agent 的使用模式與老虎機完全一致:隨機產出結果、需要持續關注、讓人不斷「再拉一次」。所謂「token anxiety」——那種「現在應該有 agent 在跑」的永恆焦躁感——本質上就是賭博成癮的症狀。結合矽谷開始擁抱 996 工時文化,企業正在把「對工作上癮」制度化。

6. Anthropic 實測:AI Agent 自主性正在如何演化

https://www.anthropic.com/research/measuring-agent-autonomy

Anthropic 分析數百萬次 Claude Code 互動數據,首次實證測量 AI agent 在實際部署中的自主程度。核心發現:最長回合時間三個月內翻倍(25→45 分鐘),但仍遠落後於模型能力上限(METR 評估可完成 5 小時任務)。資深用戶的監督策略從「逐步核准」演化為「監控+介入」,而 Claude 主動暫停詢問的頻率是人類中斷的兩倍——模型對自身不確定性有一定校準能力。

7. 別把 AI 當同事,把它當外骨骼

https://www.kasava.dev/blog/ai-as-exoskeleton

Kasava 創辦人 Ben Gregory 提出以「外骨骼」取代「同事」作為理解 AI 的心智模型。以 Ford EksoVest(減傷 83%)、Sarcos(20:1 力量放大)等真實外骨骼數據佐證:將 AI 視為自主 agent 的公司往往失望,將 AI 視為人類能力延伸的公司則取得變革性成果。停止問「如何部署自主 agent」,改問「員工在哪裡經歷最多摩擦和疲勞」。


結語

這兩週的文章呈現出一個有趣的張力:一邊是 Spotify 宣告頂尖工程師已經不寫 code、Anthropic 測量到 agent 自主性持續攀升;另一邊是 6,000 名 CEO 坦承 AI 對生產力毫無影響、coding agent 可能只是讓人上癮的老虎機。Spotify 的「未來已到」和 Solow 悖論的「數據看不見」並不矛盾——前者是科技公司的尖端實踐,後者是整體經濟的遲緩現實。真正的問題不是 AI 能不能用,而是「怎麼用才不會變成自己的老虎機」。Kent Beck 說得好:我們全被拋回了 Explore 階段,不舒服是正常的——重點是你有沒有在認真探索。


整理日期:2026-02-22
下期預計:2026-03-08

AI Biweekly Digest #1|2026 W06-W07 (01/27 - 02/09)

Articles

1. #Keep4o — Collective Resistance to AI Model Deprecation

https://arxiv.org/abs/2602.00773

When OpenAI replaced GPT-4o with GPT-5, the #Keep4o backlash erupted. An analysis of 1,482 posts revealed the core protest wasn’t about quality—it was about choice. Users with coercive language saw rights-based protest rates jump from 15% to 51.6%.

2. GPT-4o Retirement Open Letter

https://community.openai.com

OpenAI planned to retire GPT-4o on 2/13, prompting an open letter criticizing the platform for ignoring users’ emotional attachment. Complements the academic paper above—one is retrospective analysis, the other is activism in real-time.

3. Mitchell Hashimoto’s AI Adoption Journey

https://mitchellh.com/writing/ai-adoption-journey

The Ghostty developer shared his 2.5-year AI adoption journey, introducing Harness Engineering and the End-of-Day Agent pattern. Core thesis: AI is a tool, not magic—maintaining your own skills is essential for wielding it effectively.

4. StrongDM’s Dark Factory

https://factory.strongdm.ai/

A 3-person team practices “code must not be written or reviewed by humans.” They solve trust through Scenario Testing and Digital Twins of third-party APIs. $1,000/day/engineer in tokens.

5. AI Fatigue Is Real

https://siddhantkhare.com/writing/ai-fatigue-is-real

AI makes individual tasks faster, but inflated expectations make engineers more exhausted. The biggest shift: from Creator to Reviewer. Practical advice: if three prompts don’t get you to 70%, write it yourself. The real skill of the AI era is knowing when to stop.


Closing Thoughts

This fortnight’s readings paint a spectrum of AI dependency—users grieving a model’s “death,” engineers struggling between productivity and burnout, some seeking human-AI coexistence, others letting humans step away entirely. The key question for 2026: where’s the sweet spot of AI dependency?

AI 雙週報 #1|2026 W06-W07(01/27 - 02/09)

本期文章

1. #Keep4o — 用戶對 AI 模型退役的集體抵抗

https://arxiv.org/abs/2602.00773

OpenAI 用 GPT-5 取代 GPT-4o 引發 #Keep4o 運動。分析 1,482 則貼文發現,用戶抗議的核心不是品質而是「你沒給我選擇」。強制語言使用者的權利抗議率從 15% 飆升至 51.6%。

2. GPT-4o 退役公開信

https://community.openai.com

OpenAI 預計 2/13 退役 GPT-4o,社群發起公開信批評平台忽視用戶情感依附。與上篇學術論文互為補充——一個是事後分析,一個是進行式的行動。

3. Mitchell Hashimoto 的 AI 開發旅程

https://mitchellh.com/writing/ai-adoption-journey

Ghostty 開發者 2.5 年 AI 輔助開發經驗,提出 Harness Engineering 和 End-of-Day Agent。核心觀點:AI 是工具不是魔法,保持技能才能駕馭它。

4. StrongDM 暗黑工廠

https://factory.strongdm.ai/

3 人團隊實踐「code 不由人寫也不由人看」。用 Scenario Testing 和 Digital Twin 解決信任問題,每位工程師每天燒 $1,000 token。

5. AI Fatigue

https://siddhantkhare.com/writing/ai-fatigue-is-real

AI 讓任務變快但工程師更累。最大轉變是從 Creator 變成 Reviewer。實用建議:三次 prompt 搞不定就自己寫,知道何時停下來才是真正的技能。


結語

這兩週的文章描繪出一條 AI 依賴的光譜——用戶為 AI 的「死亡」悲傷、工程師在效率與疲憊間掙扎、有人找人機共存的平衡、也有人乾脆讓人類退場。2026 年的關鍵問題:AI dependency 的甜蜜點在哪?

Your AI Isn't Stupid—It Just Doesn't Know Anything: Why Context Control Matters

TL;DR

AI has no memory. Context is all it can see. Give it the right Context, and it’s brilliant. Give it the wrong one, and it’s clueless. Mastering Context is the key to working effectively with AI.


By now, most of us have used some kind of AI chatbot—whether it’s ChatGPT, Claude, or whatever AI assistant your company just rolled out. And you’ve probably noticed something strange: it’s clearly smart, yet it keeps doing dumb things.

For example, you set some ground rules at the start of a conversation, and halfway through, it forgets them. Or you explain your background once, and next time you chat, you have to explain it all over again.

Even the most powerful models in 2025—GPT-5, Claude 4.5, Gemini 3—still have this problem. To understand why, we need to look at how language models actually interact with us.


Context: The Starting Point and Boundary of Every Conversation

Once a language model is trained, its capabilities and knowledge are essentially locked in. Everything you type during a conversation—that’s not part of its training. We call this Context.

Here’s the simplest way to put it: Context is everything the AI can see in the current conversation.

This includes:

  • Your chat history with it
  • System settings (the hidden instructions you don’t see—like when the platform secretly tells it “you are a polite assistant”)
  • Any documents or data you paste in

Add all of that up, and you get the Context.

Think of it like hiring a brilliant new employee who knows nothing about you. Every time you assign them a task, you have to explain your company background, project status, and personal preferences from scratch. Context is essentially the briefing you hand them—without it, even the smartest person won’t know how to help you.

Here’s the catch: every language model has a limited Context capacity. Some can handle more, some less—basically, there’s a limit to how much text it can “see” at once. And every time you start a new conversation, the model doesn’t remember anything from before. It’s a blank slate. Every single time.


Why Does AI Get Dumber the Longer You Talk?

This isn’t just your imagination.

Think of AI like an intern you’re giving verbal instructions to. If you tell them to do 20 steps in a row, and they mishear a few along the way, the final result is going to be off. AI works the same way.

Research has shown that AI makes small errors at each step of a task. Say there’s a 5% error rate per action—sounds low, right? But errors compound:

Conversation Turns Success Rate
5 turns 77.4%
10 turns 59.9%
20 turns 35.8%
50 turns 7.7%

The more steps, the more things go sideways. And this doesn’t even account for what happens when the Context window fills up and the model starts “forgetting” earlier parts of the conversation.

To be fair, this mainly affects complex, multi-step tasks. If you’re just chatting casually, you probably won’t notice the errors. But if you’re writing code, doing analysis, or working through logic problems, one wrong step can derail everything.

That’s why AI seems sharp at the start of a conversation but feels dumber after an hour or two. It’s not actually getting dumber—the Context is getting too long and noisy, and errors are piling up.


How Do Platforms Make AI “Remember” You?

You might feel like ChatGPT or Claude remembers things about you from previous conversations.

But here’s the truth: the model itself has zero long-term memory—like a goldfish, it starts fresh every single time.

So why does it feel like it remembers? Because the platform is secretly slipping it a cheat sheet:

  1. Summarized history: The platform condenses your past conversations into a summary and injects it at the start of each new chat
  2. Dynamic retrieval: When you ask a question, the platform quietly searches your old data and feeds relevant bits to the model

The reality is: AI doesn’t actually remember you. It’s just reading a condensed version of your history with it every time.

This “memory” is an illusion—a clever one, but still an illusion. And here’s the thing: these “memories” also take up Context space.


Why Controlling Context Is Everything

Once you understand what Context is, something becomes clear: how precisely you control Context determines how well AI performs.

In the way language models work, the more relevant the Context is to the task, the better the output. The less relevant, the worse. So if you want AI to perform at its best, the key is: how do you provide high-quality Context?

In 2025, Anthropic (the company behind Claude) proposed a shift in thinking: we should move from “Prompt Engineering” to “Context Engineering.”

What’s the difference?

  • Old mindset (Prompt Engineering): “How should I phrase this instruction?”
  • New mindset (Context Engineering): “What Context configuration will most likely get the model to produce what I want?”

Here’s a cooking analogy:

  • The old approach: “Let me teach you step-by-step how to make this dish.”
  • The new approach: “Here are all the ingredients and my taste preferences—figure out the best way to cook it.”

This shift matters. We used to focus on how to ask. Now it’s more about how to inform.


What Makes Good Context?

Anthropic offers a precise definition: Find the smallest but most relevant set of information to maximize the desired outcome.

In plain English: Give information that’s precise, relevant, and free of fluff.

More Context isn’t always better. Stuff it with irrelevant information, and the model gets distracted and loses focus. It’s like handing your employee a briefing packed with unrelated company history, last year’s project notes, and office gossip—they won’t know what actually matters.

Good Context should be:

  • Highly relevant to the current task
  • Free of noise
  • Complete with the key information needed to do the job
  • Clearly structured so the model can parse it easily

Real Example: Same Question, Different Context

Let’s look at an example:

No Context:

“Write me an email.”
→ AI gives you a generic, boilerplate email. Nothing specific.

Basic Context:

“Write me an email to a client we’ve worked with for three years. They just got a new manager. Keep it formal but warm.”
→ Completely different result. At least it’s targeted.

Full Context:

On top of the above, you also provide:

  • Basic info about the client
  • Past email exchanges with them
  • The purpose and background of this email
  • Your company’s history with theirs

→ The output quality jumps another level.

The difference? The quality of Context.

If you don’t want to go that far, at least remember this simple formula:

Who’s the audience + What’s the purpose + What tone to strike

Just clarify these three things, and your results will be way better than a bare “write me an email.”


From “Teaching AI How to Do Things” to “Giving AI Enough Information”

Earlier, I mentioned the shift from Prompt Engineering to Context Engineering. Another way to look at it: we’re moving from “teaching AI how to do things” to “giving AI enough information to figure it out.”

Back when language models weren’t as capable, our prompts were mostly instructions—telling AI what steps to follow. AI was like a newbie who needed hand-holding.

Now, with 2025-level models, things are different. They’re smart enough to know how to do things. Our job is to provide enough relevant information so they can produce great output.

Anthropic observed something interesting internally: in just one year, the percentage of engineers using AI jumped from 28% to 59%, and self-reported productivity gains increased significantly. What changed their work wasn’t the model getting smarter—it was people learning how to feed it the right Context.


Conclusion

Understanding Context is the first step to working effectively with AI.

Once you realize that Context is all AI can see, you start asking different questions: How do I put the right information in? How do I make sure it sees what it needs to see? How do I avoid stuffing it with noise?

Next time AI seems to get dumber, try this mindset:

Think about what information to give before thinking about what instruction to give.

Instead of jumping straight to “what command should I type,” ask yourself: “If this were a new hire helping me, what background, data, and constraints would I tell them?” Write that down—and you’re doing Context Engineering.

In future posts, we’ll dive deeper into how to control Context effectively. This discipline is called Context Engineering.

你的 AI 不笨,只是什麼都不知道:為什麼控制 Context 重要

TL;DR

AI 沒有記憶,Context 就是它能看到的全部。給對 Context,它就聰明;給錯,它就像失憶。掌控 Context,是用好 AI 的關鍵。


這兩年大家多少都用過 AI 聊天工具,不管是 ChatGPT、Claude,還是公司導入的 AI 助理。你可能發現一件很矛盾的事:它明明很聰明,卻常常做出很笨的事。

比如說,明明一開始跟它說好規則,聊一聊它就忘了;或是你跟它講過的背景,下次好像又得重講一次。

就算是 2025 年主流的強大模型(像 GPT-5、Claude 4.5、Gemini 3),也都有這個現象。在理解為什麼會這樣之前,我們要先了解語言模型是怎麼和我們互動的。


Context:每次對話的起點與邊界

不論哪種語言模型,在它們訓練好後,它們的能力和知識就已經被固定下來了。而使用者在與語言模型對話的內容,這些不是一開始就被訓練進模型的內容,我們稱作 Context

先用一句話解釋:Context 就是這次對話中,AI 眼前能看到的所有文字。

這包括:

  • 你跟它的聊天紀錄
  • 系統設定(System Prompt,你看不到的、設定 AI 角色的隱藏指令,例如平台偷偷告訴它「你是一個有禮貌的助理」)
  • 你貼給它的任何資料

這些全部加起來,就是 Context。

想像你請了一個很聰明但完全不認識你的新員工。每次交辦任務,你都要重新告訴他公司背景、專案狀況、你的偏好。Context 就是你交給他的那份 briefing —— 沒有這份資料,再聰明的人也不知道該怎麼幫你。

而在目前的技術下,每種語言模型的 Context 容量都是有限的,有的大,有的小,簡單說就是「它一次能看多少字」也有上限。 理論上我們每次開一個新對話,模型是不會記得你之前跟它講過什麼的。每次對話的模型都像一張白紙,每次都是全新的開始。


為什麼對話越長,AI 越笨?

這不是錯覺。

你可以把 AI 想成一個幫你做事的實習生:你口頭交代他做 20 個步驟,中間只要有幾步聽錯,最後成果就會歪掉。AI 也一樣。

有研究實際測試過:AI 在處理任務時,每一步都有一定的錯誤機率。假設每個動作有 5% 的錯誤率,聽起來很低對吧?但問題是這些錯誤會累積:

對話輪數 成功率
5 輪 77.4%
10 輪 59.9%
20 輪 35.8%
50 輪 7.7%

步驟越多,累積起來就越容易歪樓。這還沒算上 Context 容量塞滿後,模型開始「遺忘」早期內容的問題。

需要說明的是,這主要發生在複雜的連續任務上。如果只是閒聊,錯了你可能也沒感覺。但如果是寫程式、做分析、或是邏輯推理,一步錯就容易全盤皆輸。

這就是為什麼你會發現:剛開始對話時 AI 很聰明,聊了一兩個小時後,它好像變笨了。不是它真的變笨,是 Context 變得太長、太雜,錯誤開始累積。


那些平台怎麼讓 AI「記得」你?

你在 ChatGPT 或 Claude 上可能會覺得:「AI 好像記得我之前說過什麼」。

但實際上,模型本身是完全沒有長期記憶的——像金魚腦一樣,每次對話都是從零開始。

那為什麼感覺它記得你?因為平台在背後幫它「帶小抄」:

  1. 摘要歷史:平台把你之前的對話整理成摘要,塞進新對話的開頭
  2. 動態搜尋:當你問問題時,平台偷偷去翻你的舊資料,把找到的內容一起丟給模型看

所以真相是:AI 不是真的記得你,而是每次都在看一份「精簡過的你和它的歷史」。

這個「記憶」是假象,但也是很聰明的設計。只是你要知道,這些「記憶」也佔用 Context 的空間。


為什麼控制 Context 是一切的關鍵

當我們了解 Context 是什麼後,大家會逐漸發現,依照語言模型的特性,如何精準掌握 Context 變得攸關重要。

因為在語言模型的演算法特性中,Context 跟任務的相關性越高,產出的品質越好;越低則越差。所以如果我們要讓 AI 發揮能力,關鍵就在於:怎麼提供高品質的 Context。

Anthropic(Claude 的開發公司)在 2025 年提出了一個觀點:我們應該從「Prompt Engineering」轉向「Context Engineering」。

差別在哪?

  • 舊思維(Prompt Engineering):「我該怎麼表達這個指令?」
  • 新思維(Context Engineering):「什麼樣的 Context 配置,最有可能讓模型產出我要的結果?」

用做菜來比喻:

  • 過去的做法是:「我教你一步一步怎麼做菜。」
  • 新的做法是:「我把所有食材和口味偏好告訴你,讓你自己決定怎麼煮最好吃。」

這個轉變很重要。過去我們專注在怎麼「問」,現在更重要的是怎麼「給資訊」。


怎樣算是好的 Context?

Anthropic 給了一個很精準的定義:找到最小但最相關的資訊組合,來達到最好的結果。

翻譯成白話就是:給的資訊要精準、相關、沒有廢話。

Context 不是越多越好。塞太多不相關的東西,反而會讓模型分心、迷失在雜訊中。就像你交辦任務給員工時,如果 briefing 裡面塞了一堆不相關的公司歷史、去年的專案紀錄、其他部門的八卦,他反而會搞不清楚重點在哪。

好的 Context 應該:

  • 和當前任務高度相關
  • 沒有多餘的雜訊
  • 包含完成任務所需的關鍵資訊
  • 結構清晰,讓模型容易理解

實際例子:同樣的問題,不同的 Context

舉個例子:

沒給 Context:

「幫我寫一封信」
→ AI 給你一封普通的制式信,什麼都沒針對。

給了基本 Context:

「幫我寫一封信給合作三年的客戶,對方最近剛換主管,語氣要正式但親切」
→ 結果完全不一樣,至少有針對性了。

給了完整 Context:

除了上面那些,你還提供:

  • 客戶的基本資料
  • 過去和客戶來往的 email
  • 這次寫信的目的和背景
  • 你們公司和對方的合作歷史

→ AI 產出的品質再上一個層次。

差別在哪?就是 Context 的質量。

如果你不想那麼麻煩,至少記住一個懶人公式:

對象是誰 + 目的是什麼 + 語氣怎麼拿捏

只要把這三件事講清楚,AI 產出的東西就會比一句「幫我寫一封信」好很多。


從「教 AI 怎麼做」到「給 AI 足夠的資訊」

前面提到,我們正從「Prompt Engineering」走向「Context Engineering」。用另一個角度來看,就是:從「教 AI 怎麼做」,變成「給 AI 足夠的資訊讓它自己決定怎麼做」。

過去在語言模型能力不足的情況下,我們給的 prompt 大多是指導 AI 該怎麼做、要遵守哪些步驟。那時候的 AI 比較像是需要詳細指令的新手。

而現在語言模型的智力和能力上來後,情況不一樣了。2025 年的模型已經夠聰明,它們知道怎麼做事。我們更多的是在提供語言模型足夠的相關內容,並藉由它的能力進行好的產出。

在 Anthropic 自己內部,他們觀察到一件有趣的事:一年之內,工程師使用 AI 的比例從 28% 漲到 59%,而且自評「產能變高了」的人也大幅增加。真正改變他們工作的,不是模型又升級了多少,而是大家越來越知道要怎麼餵對 Context。


結語

理解 Context,是跟 AI 協作的第一步。

當你知道 AI 的世界只有 Context,你就會開始思考:我該怎麼把對的資訊放進去?怎麼讓它看到它需要看到的東西?怎麼避免塞太多雜訊讓它分心?

下次你覺得 AI 變笨時,可以記一個簡單的心法:

先想「我要給什麼資訊」,再想「我要下什麼指令」。

先不要急著想「我要下什麼指令」,而是問自己:「如果這是一個新人來幫我,我會把哪些背景、資料、限制條件告訴他?」把這些寫進去,就是在做 Context Engineering。

後續我們會用不同篇幅來描述具體上我們要怎麼好好地控制 Context,而這項技術就叫做 Context Engineering