The LLM Training Corpus is a Small Subset of All Knowledge
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.
Most people don’t understand this, but the LLM training corpus is a small subset of all knowledge.
It’s just the stuff that’s been written down publicly. Most knowledge hasn’t been.
LLMs basically just read everything online and then cargo-cult it (which, don’t get me wrong, can be incredibly useful).
There is so much knowledge you can scrape from the world, that doesn’t exist anywhere online or in any book.
The way you scrape that knowledge is by getting your hands dirty solving messy problems in the real world.
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.