Filtering by Tag: ooda

Theory in Practice — OODA, Mapping, Antifragility

July 12, 2016 / Aneel Lakhani

Based on a talk presented at Velocity 2016 in Santa Clara, this post tries to show the practical application of concepts like OODA, Wardley Maps, and Antifragility with examples from my day-to-day work at a startup.

Theory

OODA — Observe Orient Decide Act

Observe the situation, i.e. acquire data. Orient to the data, the universe of discourse, the operating environment, what is and isn’t possible, and other actors and their actions. Decide on a course of action. Act on it.

Typically, you hear people saying that we’re supposed to go through the loop [O -> O -> D -> A -> O -> O…] faster than others. Let’s break that down.

If we traverse the loop before an adversary acts, then whatever they are acting to achieve may not matter because we have changed the environment in some way that nullifies or dulls the effectiveness of their action. They are acting on a outdated model of the world.
If we traverse the loop before they decides, we may short circuit their process and cause them to jump to the start because new data has come in suggesting the model is wrong.
If we traverse the loop at this faster tempo continuously, we frustrate their attempt to orient — causing disorientation — changing the environment faster than they can apprehend, much rather act.
We move further ahead in time. Or to be more exact, they’re falling further backwards: unable to match observations to models, change orientation, have confidence in decisions, or act meaningfully.

This is what Boyd called operating inside someone’s time scale.

Our main means of connecting the components of the loop is via models (and projections of cause/effect based on those models). Observations are tied into and given context via models.

Another way to think of models is as maps.

Mapping

This is a Wardley, or Value Chain, Map. It’s the most useful model I’ve encountered for building products or businesses. Watch Simon’s OSCONkeynotes or read his blog to really dig in to the concept.

It starts with a user need at the top. What problem are we solving? How are we going to make someone’s life better.

Then it goes deep, laying out the supply (dependency) chain of components needed to service that need. The further down, the less visible and exposed the component is to our end user. For example, if we’re building a SaaS product, users are never (or should never) be exposed to the systems running the code. This is the Y axis.

The X axis is where it gets interesting. It provides stages of development that components map into. Nearly everything naturally moves from the left to the right over time as invented or discovered things become standardized, well understood, built by more producers competing for market share, until some eventually become absolute commodities or provided as utilities. It’s a kind of natural evolution.

Genesis [stage 1]: something that’s being discovered/built from scratch
Custom Built [stage 2]: built out of existing technologies but highly customized for a specific use case and not generalized to broad use
Product [stage 3]: COTS software, something bought from someone else vs self-built
Commodity [stage 4]: something that’s effectively fungible, for which there are multiple equivalent providers, that may be provided as a utility

Individual components, regardless of their stage, can be expanded into finer grained production pipelines, marked as something that’s either provided or consumed, and aligned with methodologies like in-house-agile-developed vs outsourced-to-cloud-provider.

Finally, each component can be treaded as a piece on the field and moving them around as functions of product strategy, attempts at changing the competitive landscape.

For example, open sourcing something to try to commoditize it or create a de facto standard. Or providing something as a utility / platform / API in order to build a moat (that you can also consume) out of the ecosystem that you engender around it

[Anti]Fragility

But everything is constantly changing. Which means our map can become stale fast. Which makes us fragile — exposed and unaware — to ever more risks. Black swans.

A black swan is only a black swan if you can’t predict it (or assign it a probability). They’re inevitable. As our maps become out of synch with the real world, non-black swans become black swans. It’s possible to be fragile to one kind of black swan but not another. There are activities or patterns that will make us fragile with respect to something. And those that will make us antifragile.

There’s no such thing as absolute antifragility. It’s contextual. A severe enough stressor over a short enough time period will destroy anything.

Maps can be made robust (to some scale) through adaptive mechanisms, learning and correcting to match for change in the world.

But beyond some scale, every map is fragile. The world can change faster than, or so severely that, any attempt to update the map fails. Events can get inside a map’s timescale.

Systems can be antifragile (to some scale) through constant stress, breakage, refactoring, rebuilding, adaptation and evolution. This is basically how Netflix’s chaos army + the system-evolution mechanism that is their army of brains iterating on the construction and operation of their systems works.

For example, here’s our model of the APIs or services we rely on — smooth and reliable, with clearly defined boundaries and expected behavior. This is also the model that those things have of the APIs and services they rely on. All the way down

But this is how most things actually look. Eventually in the course of operation, the gaps line up in such a way that a minor fault event becomes magnified into systemic failure.

Systems, software, teams, societies — everything eventually crumbles under the weight of it’s own technical debt.

Which is why we should be refactoring, paying down technical debt, or what I just call “doing maintenance”, all the time at every layer.

Practice

Caveats: My views don’t represent those of my employer or anyone else and a great deal of detail is left out.

Example: mapping at work

I’ll build a map for a new feature SignalFx just released in beta.

Starting with the user need which I describe as “discovering known and unknown unknowns.”

A lot is left out, but generally speaking: on the top left we have the need, immediately connected to that is how that need is served and proceeding out from there is a generalized view of the supply chain of components needed to make it so.

Some things worth noting:

We rely on utility or commodity technology and services for all our infrastructure hardware and software, like operating systems, and also middleware — using things like AWS, Linux, Kafka, Cassandra, Elasticsearch, etc. This is standard behavior for a software as a service company.
We rely on relatively standard means of getting data into the system, in our case collectd, StatsD, Dropwizard metrics, etc., and a host of plugins and libraries that conform to open APIs and use well known open, or public, protocols.
We can see that there’s a lynchpin without which the map would fall apart, the streaming real-time analytics engine.
In order to build what was needed to serve that user need we started with, we needed to build many other things: a specialized quantization service, lossless + real-time message bus, specialized timeseries database, a high-performance metadata store, real-time streaming analytics engine, and an interactive real-time web-based visualization for streaming data, etc.

Many of the components we built are, if they were generalized, standalone products that others build entire companies on. In this specific case those are all the open source technologies — Kafka, Cassandra, Elasticsearch, etc — that we built our highly customized components out of.

Given all of that, I have one important positive question each day: Given the amount of time I’m going to spend working today, what one thing can I do to move the needle in serving this user need through what we do?

And one negative one, seeking invalidation: Is there any evidence that our map, our hypothesis, our approach, have been invalidated?

Is our projected user need real? Will people pay for it? Is it the problem they actually want solved? Do people really not want leverage? Do they not want to be given more power and time through tools? Do they want thinking to be replaced, instead of force-multiplied?
Is our lynchpin really the point of leverage and differentiation we believe it to be? Has it become a commodity and we’re just fooling ourselves into thinking we’ve built something novel?
Has the territory changed in any way, through macro trends or the actions of players in the ecosystem, such that we need to rework our model?

Example: knowing what’s possible

Imagine we want to build a personal relationship management [PRM] system to meet some a need for people to manage their complicated and ever-growing network of contacts.

The top left is where we’re starting from. The y-axis is basically features or sub-capabilities that add up to something. The x-axis is what they add up to: products or capabilities that are in and of themselves valuable. The bar for something belonging in the leading row is being a viable sub-product. Everything in the column below are the features needed for it. Where the line is for being able to declare that we’ve built a minimum usable product may be different per column, as may the line for what constitutes an MVP.

We have limited time, people, and money. So we can only build so many things at once. Let’s say we can only build one column at a time. We have to get to usability and viability in each column to be able to expand users and business sufficiently to build the next column.

But every single thing we build limits our options for the next thing we build. We can go down and we can go to the right.

We can scrap everything further below and further to the right of the point we’re at today and figure out something else to build from where we are. This is effectively a pivot.

But what we can’t do is go from 3 steps down in the 2nd column [a graph of your contacts that’s auto generated based on your communications with them over Gmail, Twitter, LinkedIn, Facebook, and Outlook email that shows degrees of separation] to, say, a restaurant reservations and point of sales SaaS product. There’s no getting from here to there. But you can get from here to a product referral network.

Seeking invalidation:

Is a personal relationship management service still the best way to serve the user need? Is there a better way?
Can we build to that better way from where we are?
Have we built an minimum usable product? Is it viable? Can we generate enough business (or funding) from what we’ve built to build the next thing?

Example: hiring for antifragility

The core principle of antifragility, as I see it, is to arrange things such that we get stronger through stress. More or less how muscle growth works.

How do you build that into an organization? How do you decrease brittleness? The only way I’ve ever found is through diversity. Inclusion, and forced exposure, to different points of views is absolutely necessary if we don’t want to get stuck — stuck in a way of thinking, stuck in a way of dealing with issues, stuck with a pattern of response, stuck in a point of view that makes us blind to threats and opportunities.

Think of it this way. We have to stir the pot in order to not get trapped in a local maximum. Not once, but constantly. Even a hint of homogeneity — whether it’s of people or ideas or practices or anything — is a clear signal that we are fragilizing and becoming brittle.

For my team, here’s what that looks like: no one in my org has a background in tech except for me. My background is both deep and wide, but it’s 90% tech. There’re just a large swath of things I’m blind to. What I’ve got is people who’ve studied art, biology, english lit, who don’t look like me or think like me. Things that I’m fragile to, they’re not. As a group, we’re way stronger than if I was hiring copies of myself.

Seeking invalidation:

Is this the right team? Can they do what we need to do right now? Can they do what we need to do in a quarter, in a year, in 5 years?
Are they the wrong team, or am I failing at
Helping them get from where they are to where I need them to be?
Getting the most out of their perspectives?
Creating a safe environment for them to bring their best to the table?

Questions

John Allspaw asked if it’s possible for a person to be anti fragile. I don’t think so. I don’t think any given person or component of a system can be antifragile. I think groups and systems can be made antifragile. Complexity can be a symptom of the build up of antifragility in a system. Beyond some envelope, it’s also a harbinger of collapse.

Peter van Hardenberg asked where I set the bar. Assuming baseline functional competence (can do the job at hand), the next thing I look for is differentness. What do you bring to the table that’s unique from what we already have?

Wrapping up, here are the daily operating principles arising from this study:

Always be refactoring

Diversity has intrinsic value

Territory > Map

Seek invalidation

The above builds on ideas in these previous talks and posts:

The original abstract was way too ambitious for a 40 min time slot. The presentation suffered quite a bit from me erratically moving through the material, trying to pack in too many ideas.

categories / work, tech, biz
tags / startups, product, ooda, mapping

the dangers of models

June 14, 2015 / Aneel Lakhani

Presented at Velocity 2015 in Santa Clara.

All models are wrong; some are useful.

Disconfirmatory evidence is more important than confirmatory evidence.

Actively seek model invalidation.

Every thing was built in some context, or scale. Reading primary sources, or learning how/why a thing was made, is essential to understanding the conditions that held and knowing bounding scales beyond which something may become unsafe.

This is something I think about a lot. It's true in software, distributed systems, and organizations. Which is the world I breathe in every day at SignalFx.

It began to knit together around OODA:

ooda x cloud-- positing how it OODA relates to our operating models
change the game-- the difference between O--A and -OD- and what we can achieve
pacing-- the problem with tunneling on "fast" as a uniform good
deliver better-- the real benefit of being faster at the right things
ooda redux-- bringing it all together

OODA is just a vehicle for the larger issue of models, biases, and model-based blindness--Taleb's Procrustean Bed. Where we chop off the disconfirmatory evidence that suggests our models are wrong AND manipulate [or manufacture] confirmatory evidence.

Because if we allowed the wrongness to be true, or if we allowed ourselves to see that differentness works, we'd want/have to change. That hurts.

Our attachment [and self-identification] to particular models and ideas about how things are in the face of evidence to the contrary--even about how we ourselves are--is the source of avoidable disasters like the derivatives driven financial crisis. Black Swans.

Black swans are precisely those events that lie outside our models
Data that proves the model wrong is more important than data that proves it right
Black swans are inevitable, because models are, at best, approximations

Antifragility is possible, to some scale. But I don’t believe models can be made antifragile. Systems, however, can.

Models that do not change when the thing modeled (turtles all the way down) change become less approximate approximations
Models can be made robust [to some scale] through adaptive mechanisms [or, learning]
Systems can be antifragile [to some scale] through constant stress, breakage, refactoring, rebuilding, adaptation and evolution— chaos army + the system-evolution mechanism that is an army of brains iterating on the construction and operation of a system

The way we structure our world is by building models on models. All tables are of shape x and all objects y made to go on tables rely on x being the shape of tables. Some change in x can destroy the property of can-rest-on-table for all y in an instant.

Higher level models assume lower level models
Invalidation of a lower level model might invalidate the entire chain of downstream (higher level) models—higher level models can experience catastrophic failures that are unforeseen
Every model is subject to invalidation at the boundaries of a specific scale [proportional to its level of abstraction or below]

Even models that are accurate in one context or a particular scale become invalid or risky in a different context or scale. What is certain for this minute may not be certain for this year. What is certain for this year may not be certain for this minute. It’s turtles all the way down. If there are enough turtles that we can’t grasp the entire depth of our models, we have been fragilized and are [over]exposed to black swans.

This suggests that we should resist abstractions. Only use them when necessary, and remove [layers of] them whenever possible.

We should resist abstractions.

Rather than relying on models as sources of truth, we should rely on principles or systems of behavior like giving more weight to disconfirmatory evidence and actively seeking model invalidation.

OODA, like grasping and unlocking affordances, is a process of continuous checking and evaluation of the model of the world with the experience of the world. And seeking invalidation is getting to the faults before the faults are exploited [or blow up].

Bringing it all back around to code--I posit that the value of making as many things programmable as possible is the effect on scales.

Observation can be instrumented > scaled beyond human capacity
Action can be automated > scaled beyond human capacity
Orientation and decision can be short-circuited [for known models] > scaled beyond human capacity
Time can be reallocated to orienting and deciding in novel contexts > scaling to human capacity

That last part is what matters. We are the best, amongst our many technologies, at understanding and successfully adapting to novel contexts. So we should be optimizing for making sure that's where our time is spent when necessary.

Scale problems to human capacity.

OODA.MODELS.CODE.VELOCITY - Velocity 2015 from Aneel Lakhani

The whole thing is a bit of a rambling run on mess. But it's a start. The problems are all psycho-epistemological. Which means they're all manageable. :)

A starter reading list:

categories / work, life, tech, biz
tags / talk, devops, ooda

ooda redux - digging in and keeping context

February 09, 2014 / Aneel Lakhani

Putting together some thoughts from a few posts from 2012 on OODA [one, two, three]. For some reason, the idea had been getting a lot of airtime in pop-tech-culture. Like most things that get pop-ified, the details are glossed over—ideas are worthless without execution; relying on the pop version of an idea will handicap any attempt at its execution.

I’m not an expert. But, I’d wager that Boyd: The Fighter Pilot Who Changed the Art of War is the best read on the subject outside of source material from the military.

OODA stands for Observe, Orient, Decide, Act. It’s a recasting of the cognition<->action cycle central to any organism’s effort to stay alive, focused on a competitive/combative engagement.

Get the data (observe).

Figure out what the world looks like, what’s going on in it, our place in it, our adversary’s place in it (orient).

Project courses of action and decide on one (decide).

Do it (act).

The basic premise is that, in order to best an opponent, we have to move at a faster tempo through the loop. Boyd used a more subtle description—operate inside the adversary’s time scale.

First: If we traverse the loop before the adversary acts, then whatever they are acting to achieve may not matter because we have changed the environment in some way that nullifies or dulls the effectiveness of their action. They are acting on a model of the world that is outdated.

Second: if we traverse the loop before the adversary decides, we may short circuit their process and cause them to jump to the start because new data has come in suggesting the model is wrong

Third: if we traverse the loop at this faster tempo continuously, we frustrate the adversary’s attempt to orient—causing disorientation—changing the environment faster than the adversary can apprehend and comprehend it, much rather act on it. We continue to move further ahead in time while the adversary falls backwards. By operating inside the adversary’s time scale.

Another detail from Boyd—all parts of the loop are not made equal.

Fundamentally, observation and action are physical processes while orientation and decision are mental processes. There are hard limits to the first and no such limits to the second. So, two equally matched adversaries can both conceivably hit equal hard limits on observation and action, but continue outdoing each other on orientation and decision.

But realistically, adversaries are not equally matched. We don’t observe the same way, using the same means, with the same lens, etc. We don’t act the same way, with the same speed, etc. And being able to collect more data, spend more time orienting, leads to better decisions and actions. Being able to move through different parts of the loop faster, as needed, renders the greatest advantage. Compressing the decision-action sequence gives us a buffer to spend more time observing-orienting. Nailing observation gives us a buffer to spend more time orienting-deciding. We can come up with the best--not the fastest--response and act on it at the optimal--not the fastest--time. Getting a loop or more ahead of our adversary gives us a time buffer for the whole thing. It puts us at a different timescale. It allows us to play a different game, to change the game.

Deliberately selecting pacing, timescale, game—strategic game play.

Ops/devops analogs:

Observe - instrumentation, monitoring, data collection, etc.
Orient - analytics in all its forms, correlation, visualization, etc.
Decide - modeling, scenarios, heuristics, etc.
Act - provision, develop, deploy, scale, etc.

Startup analogs:

Observe - funnel, feedback, objections, churn, engagement, market intel, competitive intel, etc.
Orient - analytics in all its forms, correlation, assigning and extracting meaning from metrics, grasping the market map and territory, etc.
Decide - modeling, scenarios, heuristics, etc.
Act - prioritize, kill, build, target, partner, pivot, fundraise, etc.

Those are analogs. It’s worth keeping in mind that OODA was developed for the context of one-to-one-fighter-jet-to-fighter-jet combat and not anything else.

categories / biz, tech
tags / cloud, devops, ooda, ops, startups

deliver better

August 24, 2012 / Aneel Lakhani

In [change the game]:

There are physical limits to observation and action. Given equally matched adversaries with access to the same data and tools, both will hit absolute limits to how fast they can observe the environment or act on it.

But realistically, adversaries are not equally matched. They gather different amounts + kinds of data. They act slower or faster.

In [ooda x cloud]:

Observe - instrumentation, monitoring, data collection, etc.

Orient - analytics in all its forms, correlation, visualization, etc.

Decide - modeling, scenarios, heuristics, etc.

Act - provision, develop, deploy, fail, iterate, etc.

What does cloud speed up? And who has the advantage?

The proximate answer is obvious: operating in cloud models accelerates action. But the real benefit of being faster to act is upstream. It's so you can spend more time figuring out what's going on out there in the world and come up with the best--not the fastest--response and act on it at the optimal--not the fastest--time.

In Wait: The Art and Science of Delay, Frank Partnoy writes:

Because Connors and Evert needed less time to hit a return, they had more time to gather and process information. They saw, they prepared, and finally, only after they had processed as much information as possible, they hit. Their preconscious time management—what their brains did during the “prepare” period—was crucial to their success. Their talent enabled them to stretch out a split-second and pack in a sequence of interpretation and action that would take most of us much longer.

And also:

Sometimes there is a first-mover disadvantage. Sometimes the second mouse gets the cheese.

Which is what I was talking about in [pacing]:

It's not at all clear...that it's uniformly better to be functioning at a faster pace than competitors at all things.

Being fast to provision + market.. is only worth anything if you can use that speed to deliver something better.

Act faster. Observe greater. Orient longer. Decide truer.

Deliver better.

categories / biz, tech
tags / cloud, ooda

pacing

August 14, 2012 / Aneel Lakhani

In [change the game], I wrote:

With each iteration, you move further ahead in time.

What you iterate and how often, your pacing, is a big question. And not all paces are created equal.

What-- there's pace of: marketing, product, strategy, back office, ecosystem growth, ecosystem cannibalization.

How often-- there's pace relative to: competitors, customers, technology advance, commoditization, ecosystem.

It's not at all clear that all of these things should be the same or that it's uniformly better to be functioning at a faster pace than competitors at all things. For example, consider what happens when your strategy changes too quickly and your products change ever so slowly (this happens more often than you'd think).. constant changes of direction, executives, board members, etc., all the while without anything new being brought to market, any changes being made to the portfolio, any response being made to competition. Dysfunction in the extreme.

It's also not clear that you should be moving at the same speed (or with the same strategy) throughout a particular business, product, or ecosystem cycle.

Eventually, people will realise that activities evolve and you need to use different methods depending upon the state of evolution.

— swardley (@swardley) August 14, 2012

categories / biz, tech
tags / ooda, mapping