Addressing any remaining LLM skepticism
Another essay extolling the virtues of AI... and addressing the practical concerns.
“The future is here — it’s just not evenly distributed.” — William Gibson
This essay is addressed to anyone still skeptical of what modern AI will be capable of. If you’ve already drunk the Kool-Aid and agree with the tweet below, you may not need to read on.
Over the past 4 months, I’ve been experimenting with programming tools and services that use LLMs at their core, for both personal and business uses.
Last weekend I went to a huge GPT-4/LLM hackathon in the Bay Area. The energy and talent there was amazing. Researchers and founders from companies like OpenAI, Anthropic, and HuggingFace were there, and they didn’t mince words on how fast progress was being made. Sure, I get it, these people aren’t just drinking the Kool-Aid — they’re the ones selling it. That doesn’t take away the fact that the advancements being made are truly groundbreaking.
At the hackathon, I saw numerous projects that showcased the potential of LLMs. There were tools that allowed you to generate entire apps from short descriptions, launch scalable server infrastructure from a text prompt, create virtual assistants, build financial models, and even communicate with robot dogs. In only a few hours, I wrote the code to create a daily, personalized podcast that summarizes what’s most important to you (and it won third place!)
The possibilities are seemingly endless, and we’re only just scratching the surface.
Objections?
Some say the models improvement will plateau. I say it doesn’t matter. The existing models — no improvement of the underlying structure — are enough for enormous, paradigm-shifting change. Even if all the researchers at OpenAI, Google, Anthropic, wherever all hang up their hats and retire, as long as today’s model versions are maintained it’s enough.
But but but… what about hallucinations!?
It doesn’t matter. All issues with current LLMs can be easily overcome to the 98th percentile of accuracy with minor supplementation. Microsoft gets this. Anything you think is an issue with LLMs isn’t. 1
We’ve gotten too used to computers as calculators or spreadsheets. The truth is, they don’t need to be 100% accurate. Nearly all of the work they’ll do isn’t life or death — it just needs to be better and cheaper than an average overworked human.
And it can get very specialized. Instead of one person doing a job with 3-4 types of tasks, you can have 5 LLM based functions: 4 of them fine-tuned to a very specific job, and a 5th that does nothing but verify accuracy.
ChatGPT is impressive, but it’s the TOY version. It’s completely generalized, fine tuned only to chat and not say things that will give OpenAI bad press.2
What about all the compute needed!?
This might be a bigger problem, but still… doesn’t matter in the long run.
Specialized models will be compressed and optimized for specific use cases, some on custom chipsets designed to do nothing but LLM inference.
Apple will put optimized LLMs on iPhones and Macs that can run 80% of the most common tasks, and it will be trained to know when to outsource the remaining 20% to a cloud API.
Some general observations
“Out of the box” LLMs can be amazing, but are very limited in business applications. The moment you start hooking them up things like long-term memory and tools, they become scary useful.
LLMs aren’t just for communicating, they’re for reasoning. If you present information in the right way, these models can absolutely reason and problem solve. Some things that help:
Giving enough context on the problem — either through directly providing all relevant details, or via context injection from searching some corpus of knowledge.
Zero- or one-shot solutions are unlikely. The best uses I’ve seen are chains of multiple prompts, supplemented with other functions and data. You can even use multiple different models (each can be better or worse at certain things) at each step along the way.
See this footnote for a case study on how small changes to how an LLM is queried can have large effects.3
Searching the web like Bing Chat is just one trivial example of the tools an LLM can use. ChatGPT plugins that were released last week — although not available to the public yet — finally show how powerful this can be. Giving the model tools it can use is no different from giving a human tools. A few examples I’ve observed:
“Ask a human” — Give the model the ability to ask a human something it needs to find the answer. In our support bot for example, if there’s a piece of information missing, it can ask a human on the Slack channel. The answer is then fed back into the prompt chain.
WolframAlpha — With the world’s most advanced calculator, no more complaining about GPTs getting math wrong.
Use company APIs — Connect the models to any internal APIs so that it has up-to-date status on your system.
Zapier automations — Use anything you can connect to in Zapier. Like “Summarize my last email and send it as a text to Sarah.”
An “inevitable turn in human history”
From James Pethokoukis’ summary of the Goldman Sachs report on generative AI:
Here are the main takeaways from GS economists Joseph Briggs and Devesh Kodnani, which I shall subsequently dive into:
Generative AI could raise annual US labor productivity growth by just under 1½ percentage points over a 10-year period following widespread business adoption.
Generative AI could eventually increase annual global GDP by 7 percent, equal to an almost $7 trillion increase in annual global GDP over a 10-year period.
Generative AI will be disruptive to jobs: “We find that roughly two-thirds of current jobs are exposed to some degree of AI automation, and that generative AI could substitute up to one-fourth of current work.”
AI investment could approach 1 percent of US GDP by 2030 if it increases at the pace of software investment in the 1990s. (That said, US and global private investment in AI totaled $53 billion and $94 billion in 2021, a fivefold increase in real terms from five years prior.)
See, even the vampire squid is getting in on the hype.
Predicting 10 years out is hard, especially for something moving this fast. But in the spirit of the Bill Gates quote — “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten” — I’d bet on Goldman’s predictions being too conservative. I think generative AI will boost productivity and GDP per capita by a lot more.
Now about the jobs: two-thirds of jobs being exposed to AI automation sounds about right. Basically any job that currently involves a computer or talking over the phone. My gardeners are safe. Substituting a quarter of all workers is another question.
I believe that in the long run, there will be more than enough jobs to go around. The nature of many of those jobs will be very different. And the transition to the new world may be painful for many.
The truth is, it’s possible to believe this will be a paradigm-shifting epoch, and still have no idea how it will play out — both good and bad.
But since we are not used to living in moving history, and indeed most of us are psychologically unable to truly imagine living in moving history, all these new AI developments pose a great conundrum. We don’t know how to respond psychologically, or for that matter substantively. And just about all of the responses I am seeing I interpret as “copes,” whether from the optimists, the pessimists, or the extreme pessimists (e.g., Eliezer). No matter how positive or negative the overall calculus of cost and benefit, AI is very likely to overturn most of our apple carts, most of all for the so-called chattering classes.
The reality is that no one at the beginning of the printing press had any real idea of the changes it would bring. No one at the beginning of the fossil fuel era had much of an idea of the changes it would bring. No one is good at predicting the longer-term or even medium-term outcomes of these radical technological changes (we can do the short term, albeit imperfectly). No one. Not you, not Eliezer, not Sam Altman, and not your next door neighbor.
To the remaining AI skeptics, I say this: Don't be afraid of the AI revolution; embrace it. Understand that AI is a tool, not a replacement for human ingenuity. Yes, some jobs may change or disappear, but new ones will emerge — ones that require human creativity, empathy, and collaboration. By working together with AI, we can accomplish things that were once thought impossible.
Now that you’ve made it this far, what do you think?
One example re: hallucinations. If you ground the model to reality via context injection (inserting known true data with sources into prompts), hallucinations become nearly non-existent. This is what Bing Chat does with web searches. Not 100% still, but enough for most use cases.
Take coding as another example. I’ve seen multiple takes that basically amount to “I tried to have ChatGPT generate code, and it had an obvious error half the time. Coding jobs are fine!”
Ok, what happens when you hook it up to an interpreter and then feed the results of that back into the prompt? (A function that takes less than an hour to set up with their APIs.) 99% accurate.
It messed up a reference to a library because it’s base knowledge is outdated? Now what happens when you when you allow it to search the web for documentation?
As soon as you start to customize these models to do more specific jobs, you start to quickly understand how powerful the existing versions already are.
This is a short case study on how small changes to how an LLM is queried can have large effects.
Goal: When Mashgin’s support team had a technical issue they couldn’t solve, I wanted an LLM to search our knowledge base and provide relevant information to help them solve it. This tool combines the following:
Base LLM model (first OpenAI’s
text-davinci-003
, thengpt-3.5-turbo
)Basic prompt engineering to elicit proper response
Context injection: uses OpenAI embeddings to search our knowledge base, then includes the relevant sections in the prompt
After implementing this, the results were… ok. Sometimes it provided useful information, many times not, and on occasion it would hallucinate.
After rethinking the problem, the following changes were made:
Prompt the model to generate the top 3 questions needed to solve the problem that could be answered in Mashgin’s knowledge base.
Use context injection to “ask” the knowledge base these 3 questions, getting answers for each.
Summarize the 3 answers.
This new communication flow drastically improved results. Before, maybe 40% of the results were truly useful. After the change I’d say 80-90% were useful. With further tinkering and cleaning the input data up more (garbage in → garbage out), I think it could get up to 95%+.
The point here is that this improvement required no underlying change in the model — only iterations with prompt chaining.