Take the Iterative Path

FutureBlind Podcast

0:00

-19:35

Take the Iterative Path

How SpaceX innovates by moving fast and blowing things up.

Max Olson

Sep 27, 2022

One of the greatest business successes over the last 20 years has been SpaceX’s rise to dominance. SpaceX now launches more rockets to orbit than any other company (or nation) in the world. They seem to move fast on every level, out executing and out innovating everyone in the industry.

Their story has been rightfully told as one of engineering brilliance and determination.

But at its core, the key their success is much simpler.

There’s a clue in this NASA report on the Commercial Crew Program:

SpaceX and Boeing have very different philosophies in terms of how they develop hardware. SpaceX focuses on rapidly iterating through a build-test-learn approach that drives modifications toward design maturity. Boeing utilizes a well-established systems engineering methodology targeted at an initial investment in engineering studies and analysis to mature the system design prior to building and testing the hardware. Each approach has advantages and disadvantages.

This is the heart of why SpaceX won. They take an iterative path.

Taking the determinate path

Let’s talk about the Boeing philosophy first, which is the most common approach taken by other traditional aerospace companies. “There are basically two approaches to building complex systems like rockets: linear and iterative design,” Eric Berger writes in the book “Liftoff” about the early history of SpaceX:

The linear method begins with an initial goal, and moves through developing requirements to meet that goal, followed by numerous qualification tests of subsystems before assembling them into the major pieces of the rocket, such as its structures, propulsion, and avionics. With linear design, years are spent engineering a project before development begins. This is because it is difficult, time-consuming, and expensive to modify a design and requirements after beginning to build hardware.

I call this the “determinate path” — in trying to accomplish a goal, the path to get there is planned and fixed in advance.

In project management this method is called waterfall, an “approach that emphasizes a linear progression from beginning to end of a project. This methodology, often used by engineers, is front-loaded to rely on careful planning, detailed documentation, and consecutive execution.”

Spend a lot of time scoping and planning carefully upfront, then move progressively forward step-by-step. This is the “measure twice, cut once” approach.

You may be familiar with it as it’s very common in organizations everywhere.

There can be many reasons why this path would be taken:

If from the start you have very clear, unambiguous requirements (from customer, management, etc.)
If you think you can figure out how exactly to build something before building it, you’d probably want to plan it all in advance.
If your fixed costs are high, it can force you to make decisions up front. Take traditional auto manufacturing. A door mold machine might cost $50 or $100M, so you have to figure out what the design of the door will be first. (But this means if later they have a new idea for a better car door, they don’t want to change it because of the sunk costs of the mold machine.)
You have a lot of resources, which makes you think you can just brute force it and overwhelm the problem with money and people. (Many overfunded startups are guilty of this.)

But there is another way . . .

Taking the iterative path

When I think of the most impactful technologies over the last 100 years, nearly all were created by small teams of tinkerers.

Why? It’s easier for these teams to take an iterative path.

Taking this path means rapid prototyping, testing concepts against reality, failing, and adapting. Continuing from the book “Liftoff”:

The iterative approach begins with a goal and almost immediately leaps into concept designs, bench tests, and prototypes. The mantra with this approach is build and test early, find failures, and adapt.

Focus more on building and finding failure modes than making things perfect. Project managers call it “agile”, or at Facebook, “move fast and break things.”

The canonical example of this to me is the Wright brothers, previously bicycle mechanics, building iterations of their airplane design over and over, and failing until they succeeded.

This approach ended up being common in the origin stories of all airplane manufacturers and defense companies — Martin Marietta, Lockheed, Northrop Grumman, etc., where again you had relatively small teams of self-taught tinkerers building complex machines through a process of iteration, failure, and learning until they succeed.

How can you reconcile this “fail fast” approach with the care that’s needed to reliably build things where human lives are on the line?

The answer is that these can be two different parts of the organization. Working together, but with different focuses. “[SpaceX is] launching 5 or 6 times a month and on their pads they need operational excellence with zero risk — you know, they’re doing innovation but it’s minimal innovation. Blowing things up on the pad is not a good idea — you want that down to zero because human lives and certainly lots of capital is at risk.” This is Steve Blank on a recent Village Global podcast. He continues:

But on the other hand, they have another part of the company that in fact believes in not only blowing things up on the test pad — because if you’re not doing that you’re not pushing the envelope fast enough — it’s the cycle time of doing that. So they have an agile innovation process.
Now think about that. This is the same company doing two very different things with two different groups of people, two different risk profiles, but more importantly they’re talking to each other. It’s not “here are the smart people, and here are the people turning the crank,” they’re learning from each other. The guys building the raptor engines and Starship need to know where the GFC plugs in and what the right materials and things they need to get right on the next rocket. And the people doing the existing rockets can learn about new materials and incremental upgrades so they are innovating but innovating with minimal risk.

The iterative path is easier to take when you’re nimble and the cost of failure is low. This is why it’s so common in software. But as the previously mentioned companies have shown, it’s also the best approach in hardware and complex, frontier tech.

And just as the traditional aerospace companies have demonstrated, organizations that are very bureaucratic now were almost always more iterative in the past.

The early history of Lockheed’s Skunk Works division is informative, which I believe later served as one of the models for SpaceX’s approach. Skunk Works was an R&D group created by Kelly Johnson within Lockheed during the war in 1943 when they got the contract to build the P-80 Shooting Star. From a documentary on the birth of Skunk Works:

Lockheed was already swamped in terms of manpower, tooling, and facilities with wartime contracts but this was a blessing in disguise, an opportunity to implement an idea he’d been pestering Robert Gross about for years. Let him round up a small group of talented people: designers, engineers and shop men. Put them under one roof where they could all work closely together and give him complete authority over everything from procurement to flight tests.

Johnson gathered 28 engineers including himself, and 105 “shop men” (I assume this just means workers who can build what the engineers design) and built a small facility out of discarded shipping crates using a circus tent for a roof. He then laid out the original rules that would become the foundation for Skunk Works over the next 30 years:

. . . he’d be responsible for all decisions. Paperwork and red tape would be cut to the minimum. Each engineer would be designer, shop contact, parts chaser, and mechanic, and each would remain within a stone’s throw of the shop at all times. . . . Forcefully reminded that simplicity is the keynote of good design, the designers jumped into their work. But this was a new kind of operation, and instead of moving from stage to stage, the schedule demanded an extraordinary degree of concurrency.

The time from initial concept to delivery of the first P-80 to test pilots would be only 5 months. In fact, nearly all of the early planes coming out of Lockheed took less than 6 months — less than 6 months from concept to delivery. Crazy!

Photo from an engineer of the Lockheed A-12 being developed in the 1960s.

Even the famous A-12 (later the SR-71 Blackbird) look less than 4 years from initial idea to roll out. This may seem like a lot when you’re used to super-fast software timelines, but this is 4 years for one of the fastest, most successful aircraft ever built.

The scrappy culture lived on in later Skunk Works projects. This is Ben Rich, who led the division in later years, on their building of the F-117 (this is the Darth-Vader-looking stealth fighter you’ve probably seen before):

On the F-117, we had to get the guy to climb into the cockpit. So I went to the local builders mart, and bought one of these ladders for 50 bucks, and we just used it. . . . We didn’t have to spend thousands of dollars designing it for Mil spec — military specification — and we did simple things like that.

The more you learn about the history of building things, the more you hear stories like this, even with highly complex innovations. The development of the Sidewinder missile is another interesting example: again, small team, rapid iteration, creative solutions to problems.

Why is iteration better?

Taking the iterative path tests your model against reality, getting to the truth as fast as possible.

There are a few major downsides to the linear approach:

Clear specs and requirements from the outset may seem like a good thing. Much of the time though they don’t match reality though. This is especially true in areas that are pushing the boundaries of innovation.
Over time, “the spec” becomes the most important thing. Here’s Ben Rich again, on one of the requirements for building the SR-71 Blackbird:
Some general insisted that there was a military spec that the SR-71 had to say ‘U.S. Airforce’ and the stars and bars. I said ‘General . . . you’re crazy.’ I said, you know, this has the temperature of an oven. Have you ever taken a piece of metal, painted it and stick it under your broiler? You can’t keep the paint on the metal. He said ‘No, the spec says you gotta say U.S. Air Force on our airplanes.’ I said we’ll develop it. So we spent a million dollars developing a paint that could show red, white and blue, and we put it on the airplane. . . . I mean, who’s going to see you at 90,000 feet?
In this example and many others, politicians start dictating how work should be done, rather than just setting the goal like they should be. Conditions for funding become completely removed from the outcome itself, like mandates to use certain suppliers or base employees in certain states.
The technical scope is too large, so that when there’s a problem, it’s hard to find the root cause. When there’s a problem you have to go back to the drawing board, but you may not even be able to do that given the cost to start over.
You become too risk averse, fearing failure. This is pretty simple: if the costs to start something or fail are high, people don’t want to do new things. From the book “Liftoff”:
At most other aerospace companies, no employee wanted to make a mistake, lest it reflect badly on an annual performance review. Musk, by contrast, urged his team to move fast, build things, and break things.
And from an executive of Blue Origin on what they do wrong:
I believe we study a little too much and do too little . . . More test [rather than] more analysis will allow us to progress more quickly, iterate, and eventually succeed.

A good iterative approach creates tight feedback loops, like John Boyd’s classic OODA loop: observe, orient, decide, act. Observe what’s going on, orient yourself to the environment, decide what needs to be done to make progress, act on that decision, and return to observing the results from your action. From the book “Certain to Win” on Boyd’s philosophy:

What does it take to win? Agility — the ability to rapidly change one’s orientation (worldview) in response to what’s happening in the external world.

This was referring to combat. But it’s just as true in business and engineering.

Tight feedback loops lead to a high rate of innovation and adaptation, quickly finding better solutions and what not to do. Speed is a tactical advantage.

“Innovation per year is what matters. Not innovation absent time. . . . What is your rate of innovation? that matters. And is the rate of innovation, is that accelerating or decelerating?” — Elon Musk

Brian Armstrong, founder of Coinbase, has a good saying that “action produces information”. You can’t predict the future, so just start building. And that’s what SpaceX did. Eric Berger writes that “the engineers designing the Falcon 1 rocket spent much of their time on the factory floor, testing ideas, rather than debating them. Talk less, do more.”

The iterative path in practice

So I’ve convinced you that iteration is best. What does it mean in practice?

Agile is straightforward for software, but not so much hardware.

Historically, linear/waterfall has been easier to do in hardware, especially for large engineering projects. A lot of upfront costs are needed — so spend a lot of time gathering requirements and planning before actually building.

For companies building big, complex, expensive things, it seems like a reasonable assumption that you have to know exactly what you’re doing, plan a lot, and be risk averse. As you know, this isn’t true! You can be fast, nimble, and agile even in megaprojects.

In complex hardware what this means is:

Being hardware rich — having lots of spare parts and backups.
This lets you move quickly, and continue to try things over and over, because you have all these “at bats”. I’d include 3D printing as it allows you to create parts on an ad hoc basis.
Using simulations — move atoms to bits when possible, giving you the freedom to quickly test and have all the benefits of software. If you can simulate what’s happening in the real world with enough accuracy, you can fail as much as you want. This is another area that has changed a lot in past decade or so. Here is Cliff Berg on SpaceX’s use of simulation:
SpaceX has invested a great deal of effort in automating the process of designing and validating through simulation, and delivering the machinery that they build through automation. They use traditional computer-aided design (CAD) tools such as CATIA, but they also invested in an end-to-end 3D modeling system from which they can view and simulate entire assemblies and automatically print parts. Importantly, the software is fast, even when handling complex assemblies, so that engineers do not have to wait, which encourages a rapid iterative design approach.
And further from a slide in a presentation given by SpaceX Director of Research Adam Lichtl and Lead Software Engineer Stephen Jones:
Why Simulation?
1. Investigate what cannot be measured
2. Reduce need for testing
3. Design optimisation: narrow design space
4. Proactive instead of reationary design
Constantly testing the whole system. Simulation is great but in the end you have to actually test everything out.
In these complex systems, in the end what really matters is how the whole thing behaves in the real world. This reminded me of how George Mueller, who led NASA’s spaceflight program during the Apollo missions in the 60s, approached building the Saturn rocket:
At a system level you’re much better off testing the system [rather than the individual parts] because in the end that system has to work. And then the only way you find out is if you test it as a system.
Subsystems of the rocket would only be tested if needed.
Utilizing “pathfinders”. In manufacturing, a pathfinder is an early build of something that won’t end up seeing the light of day.
You build a pathfinder to see where problems are, and how it can be done better. You know it will fail or be suboptimal, you’re just looking for how to do it better.
This is similar to a “tracer bullet”. You fire it first with no expectation of hitting the target, watch it, and then adjust your aim.

Doing all of these things, like SpaceX, leads to much faster iteration. This is something many other companies can learn from.

In summary: When a goal is big and complicated, an iterative, fail-fast approach is much better than a linear approach. Determinate paths lead to slower, poorly-adapted solutions, whereas iteration finds problems faster with a result that’s better adapted to the real world.

Thanks to Rohit for providing feedback on the draft. All illustrations in this essay are a combination of DALL-E generations and my personal edits.

Intro/outro music: “Cinnabar” by Roger & Brian Eno, from the album Luminous.