The LLM Training Corpus is a Small Subset of All Knowledge
LLMs are trained on what's been written down publicly. Most knowledge hasn't been. The way to access the rest is by getting your hands dirty solving real problems in the world.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.
Most people don’t understand this, but the LLM training corpus is a small subset of all knowledge.
It’s just the stuff that’s been written down publicly. Most knowledge hasn’t been.
LLMs basically just read everything online and then cargo-cult it (which, don’t get me wrong, can be incredibly useful).
There is so much knowledge you can scrape from the world, that doesn’t exist anywhere online or in any book.
The way you scrape that knowledge is by getting your hands dirty solving messy problems in the real world.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.