Filtering by Tag: devops

the dangers of models

All models are wrong; some are useful.

Disconfirmatory evidence is more important than confirmatory evidence.

Actively seek model invalidation.

Every thing was built in some context, or scale. Reading primary sources, or learning how/why a thing was made, is essential to understanding  the conditions that held and knowing bounding scales beyond which something may become unsafe.

This is something I think about a lot. It's true in software, distributed systems, and organizations. Which is the world I breathe in every day at SignalFx.

It began to knit together around OODA:

  • ooda x cloud-- positing how it OODA relates to our operating models
  • change the game-- the difference between O--A and -OD- and what we can achieve
  • pacing-- the problem with tunneling on "fast" as a uniform good
  • deliver better-- the real benefit of being faster at the right things
  • ooda redux-- bringing it all together

OODA is just a vehicle for the larger issue of models, biases, and model-based blindness--Taleb's Procrustean Bed. Where we chop off the disconfirmatory evidence that suggests our models are wrong AND manipulate [or manufacture] confirmatory evidence. 

Because if we allowed the wrongness to be true, or if we allowed ourselves to see that differentness works, we'd want/have to change. That hurts.

Our attachment [and self-identification] to particular models and ideas about how things are in the face of evidence to the contrary--even about how we ourselves are--is the source of avoidable disasters like the derivatives driven financial crisis. Black Swans.

  • Black swans are precisely those events that lie outside our models
  • Data that proves the model wrong is more important than data that proves it right 
  • Black swans are inevitable, because models are, at best, approximations

Antifragility is possible, to some scale. But I don’t believe models can be made antifragile. Systems, however, can.

  • Models that do not change when the thing modeled (turtles all the way down) change become less approximate approximations
  • Models can be made robust [to some scale] through adaptive mechanisms [or, learning] 
  • Systems can be antifragile [to some scale] through constant stress, breakage, refactoring, rebuilding, adaptation and evolution— chaos army + the system-evolution mechanism that is an army of brains iterating on the construction and operation of a system

The way we structure our world is by building models on models. All tables are of shape x and all objects y made to go on tables rely on x being the shape of tables. Some change in x can destroy the property of can-rest-on-table for all y in an instant.

  • Higher level models assume lower level models 
  • Invalidation of a lower level model might invalidate the entire chain of downstream (higher level) models—higher level models can experience catastrophic failures that are unforeseen 
  • Every model is subject to invalidation at the boundaries of a specific scale [proportional to its level of abstraction or below]

Even models that are accurate in one context or a particular scale become invalid or risky in a different context or scale. What is certain for this minute may not be certain for this year. What is certain for this year may not be certain for this minute. It’s turtles all the way down. If there are enough turtles that we can’t grasp the entire depth of our models, we have been fragilized and are [over]exposed to black swans.

This suggests that we should resist abstractions. Only use them when necessary, and remove [layers of] them whenever possible.

We should resist abstractions.

Rather than relying on models as sources of truth, we should rely on principles or systems of behavior like giving more weight to disconfirmatory evidence and actively seeking model invalidation. 

OODA, like grasping and unlocking affordances, is a process of continuous checking and evaluation of the model of the world with the experience of the world. And seeking invalidation is getting to the faults before the faults are exploited [or blow up]. 

Bringing it all back around to code--I posit that the value of making as many things programmable as possible is the effect on scales.

  • Observation can be instrumented > scaled beyond human capacity
  • Action can be automated > scaled beyond human capacity
  • Orientation and decision can be short-circuited [for known models] > scaled beyond human capacity
  • Time can be reallocated to orienting and deciding in novel contexts > scaling to human capacity

That last part is what matters. We are the best, amongst our many technologies, at understanding and successfully adapting to novel contexts. So we should be optimizing for making sure that's where our time is spent when necessary.

Scale problems to human capacity.

aws lambda - some words

To get these out of my head so I can stop thinking about them...

At re:Invent last year, Ben Golub was up on stage singing the praises of Docker. The masterminds at AWS had arranged for a solid 20-30min of Docker love-in before making the day 2 technical announcements. Ben said that [one of] Docker's goals was to free developers from having to worry about production and delivery (or something like that, see his keynote). Then Werner comes on stage, describes Lambda, and more or less says that while others are trying to free developers--Lambda actually does that. Pretty amusing.

Lambda will drive some usage away from other AWS services. I've already seen experimentation and real usage start amongst high end AWS users (not just Netflix). You could view it as cannibalization, but it's much smarter. Presumably AWS has figured out how to price Lambda in an accurate way such that the cost of all the underlying and adjacent services consumed is priced in.

Lambda might be a "true" PaaS in the sense of being a pure runtime where you don't have to understand the underlying mechanics or implementation of compute, storage, database, etc etc at all. There are no buildpacks, runtime plugins, etc etc like you have in most PaaSes.

Like Jeff Barr said in his blog post: "You don't have to configure, launch, or monitor EC2 instances. You don't have to install any operating systems or language environments. You don't need to think about scale or fault tolerance and you don't need to request or reserve capacity. A freshly created function is ready and able to handle tens of thousands of requests per hour with absolutely no incremental effort on your part, and on a very cost-effective basis."

Although it has constraints, like only being Node and only allowing up to 1GB of memory consumption per function (last I checked), etc--it's a completely abstracted runtime environment. You give it code and a few variables. It does the rest.

It completely removes Ops. Why DevOps when you can just Dev? It's more like Google App Engine than anything else out there. But GAE won't let you have long running functions (more than a few secs, last I checked), so in its limited way it's already a step ahead.

Where a Docker container gives you theoretical portability because your entire app is packaged in a way that's independent of what it's running on (but not really). Lambda locks you in because you have no idea how your code is running or what it takes to run your code. The only thing you could conceivably move to is GAE, but you'd have to rewrite bits and metadata in order to do it. Oh, except that GAE doesn't do Node. So nevermind.

It's brilliant. 

It's also dangerous. If you never learn how the thing below what you are doing--what you are downstream of and rely on--works, then you become intrinsically dependent on the provider of that service. Great when that service is an actual commoditized utility with multiple providers in a competitive marketplace. Miserable when it's a monopoly. Creating that dependence is good gameplay on AWS's part. Not providing equivalent alternatives that conform to the same interfaces is bad gameplay on everyone else's. Becoming hooked is a poor decision on our part, unless we do it with eyes wide open and willingness to do the work of unhooking ourselves in the future.


Or, as Nick Weaver puts it:


unicorns and the language of otherness

Because even in the face of overwhelming evidence, people will come up with excuses for why they should not, will not, can not—learn or change.

Presented at Velocity NY 2014.

Transcribed:

This man is albino, which means he has no skin pigmentation.

The red you see is the blood below the skin. His name is Brother Ali. He is a muslim rapper from Minnesota. That makes him different from all of us, in some way. And in all likelihood, we don’t think like him.

Let’s say that I believe the earth is flat. It’s part of my identity. It’s a strong belief. I have convictions around it, decisions that I’ve made around it. I identify as an earth-is-flatter. My identity is invested in the earth being flat. An attack on the idea is an attack on me. If the idea is wrong than I am wrong. Personally. Not just about that one thing, but about my person.

Let’s say you believe something different. You believe that the earth is round. You’re an earth-is-rounder. That makes you apart from me. Not because you have a different idea, but because you have a different identity. I cannot identify with you. If you’re successful in your belief, then maybe my way isn’t the only way. If you’re more successful than I am, then maybe my way isn’t the best way. If you are successful and then I am less successful, then maybe I’m wrong. But I’m not just wrong about the idea, I am wrong as a person.

But, I don’t have to see that. I don’t have to see anything. I have labeled you as something other than me. I cannot identify with you, thus I do not have to see your success. I can ignore it. I can bury my head in the sand. My ingrained belief creates a bias about you that I have. And I rationalize that bias by calling you something else, by putting a label on you. 

There is a saying by our friend, Brother Ali, that we have a “legacy so ingrained in the way that we think that we no longer need chains to be slaves.” He’s taking about racial biases. but any ingrained way of thinking creates a bias. Biases pile up and compound into a kind of psychological debt. It’s like technological debt: you have to refactor it in order to move on. It will eventually slow you down, bog you down, prevent you from seeing things. Prevent you from noticing thing. Prevent you from seeing a thing you might want to learn. 

And what’s true of you as an individual is true of us as groups. Teams can have shared biases created by their entrenched ideas and ways of doing things that create a shared psychological debt that prevent them—not just from learning—but from seeing that they should be learning. And while they are not learning, while we are not learning, there are other people who have learned and through their learning have changed the world around us. 

I was an analyst at Gartner for a couple of years and I heard this all the time: - “These companies are not like us. They do things differently. They have different users. They have different environments. They can do whatever they want. They don’t have the same security concerns we do.” Any litany of excuses that say “we don’t have to learn from them because they are unicorns” and unicorns are different and different people are others. So, eh. It’s ok. 

Turns out that unicorns are just people. And as people, they’re just like us. They’ve just made a different set of decisions in a different context in a different environment. We can make different decisions. We can create a new context. We can pay down our psychological debts. We can even declare bankruptcy like people do with economic debt and start over, throwing out ideas and practices. 

Cause the thing is, if we really want to move forward and expand and learn and grow and change for a changing environment—we have to get past the mess of our past decisions. We have to separate our identities, who we are and who we will be, from who we were, what we have done and what we have been. So that when we encounter something different or see change, or see change in others, that is not a threat to our identity and it doesn’t hurt so much to accept change and to do change. 

I don’t want to be a unicorn. I don’t want to be someone who is apart from you, other than you, does not have to be listened to, can be dismissed. And I don’t want to think of anyone else as something special, apart, different, cannot be learned from, to be dismissed, not part of the same humanity that I’m in. 

Cause, in the beginning and in the end, we are all still people. Thus, mainly in essence the same. The fact that we have some simultaneous differences, that have evolved, that don’t cause us to die out there in the world—suggests that the single strongest signal that you have something to learn is the fact that a difference exists. 

..the single strongest signal that you have something to learn is the fact that a difference exists. 

devops appops infraops all the ops

Donnie Berkholz wrote a great post about what’s actually happening as Dev vs Ops becomes DevOps [I know I know, keep your groaning to a minimum].

This is a conversation I had frequently at Gartner. People would ask what kind of person they need to hire to do DevOps. I would respond with “Well you already have developers. You have some Unix admins hanging around? Yeah, get them.” 

I was once an Irix and Solaris admin. At that time, any good admin was dedicated to automating themselves out of work so they could spend most of the day on IRC, playing games, or reading newsgroups. Automating infrastructure and platforms that get more or less treated like a service by devs was once normal. And now it will be again.

Things don’t go away; the lines just move. Devs own their code through the lifecycle of an application (and it’s constituent services) from dev/test all the way through production and day to day operations. Ops (or IT or platform or whatever) own infrastructure through the lifecycle of an application (and its constituent services) from dev/test all the way through production and day to day operations.

So they have to work together every step of the way. Iterate together. Where exactly the line resides for any given org changes. For example, our "infrastructure" may only go up to the OS image but not all the way up to the runtime. But someone else's could go up to the runtime or not even as far up as the OS image. Regardless of where the line is, we end up having something that’s more like AppOps (AppDevOps!) and InfraOps (InfraDevOps!). InfraOps provides the infra or platform service that the app is built on. AppOps builds and runs the app on that service. They could be the same person, the same team, different people, different teams, generalists or specialists, in-house or outsourced to a cloud provider—it doesn’t really matter.  

Screen Shot 2014-06-04 at 8.29.15 AM.png

I don't really care about the terms. Neither should you. As many people point out, we end up back at devs and admins. Took bloody long enough. :)

--

P.S. 

@aneel@dberkholz What about opsing all the devs?

— Dan Turkenkopf (@dturkenk) May 27, 2014

Yes. That too. :)

ooda redux - digging in and keeping context

Putting together some thoughts from a few posts from 2012 on OODA [one, two, three]. For some reason, the idea had been getting a lot of airtime in pop-tech-culture. Like most things that get pop-ified, the details are glossed over—ideas are worthless without execution; relying on the pop version of an idea will handicap any attempt at its execution. 

I’m not an expert. But, I’d wager that Boyd: The Fighter Pilot Who Changed the Art of War is the best read on the subject outside of source material from the military.

OODA stands for Observe, Orient, Decide, Act. It’s a recasting of the cognition<->action cycle central to any organism’s effort to stay alive, focused on a  competitive/combative engagement. 

Get the data (observe). 

Figure out what the world looks like, what’s going on in it, our place in it, our adversary’s place in it (orient).

Project courses of action and decide on one (decide).

Do it (act).

The basic premise is that, in order to best an opponent, we have to move at a faster tempo through the loop. Boyd used a more subtle description—operate inside the adversary’s time scale. 

First: If we traverse the loop before the adversary acts, then whatever they are acting to achieve may not matter because we have changed the environment in some way that nullifies or dulls the effectiveness of their action. They are acting on a model of the world that is outdated

Second: if we traverse the loop before the adversary decides, we may short circuit their process and cause them to jump to the start because new data has come in suggesting the model is wrong

Third: if we traverse the loop at this faster tempo continuously, we frustrate the adversary’s attempt to orient—causing disorientation—changing the environment faster than the adversary can apprehend and comprehend it, much rather act on it. We continue to move further ahead in time while the adversary falls backwards. By operating inside the adversary’s time scale.

Another detail from Boyd—all parts of the loop are not made equal.

Fundamentally, observation and action are physical processes while orientation and decision are mental processes. There are hard limits to the first and no such limits to the second. So, two equally matched adversaries can both conceivably hit equal hard limits on observation and action, but continue outdoing each other on orientation and decision. 

But realistically, adversaries are not equally matched. We don’t observe the same way, using the same means, with the same lens, etc. We don’t act the same way, with the same speed, etc. And being able to collect more data, spend more time orienting, leads to better decisions and actions. Being able to move through different parts of the loop faster, as needed, renders the greatest advantage. Compressing the decision-action sequence gives us a buffer to spend more time observing-orienting. Nailing observation gives us a buffer to spend more time orienting-deciding. We can come up with the best--not the fastest--response and act on it at the optimal--not the fastest--time. Getting a loop or more ahead of our adversary gives us a time buffer for the whole thing. It puts us at a different timescale. It allows us to play a different game, to change the game

Deliberately selecting pacing, timescale, game—strategic game play.

Ops/devops analogs:

  • Observe - instrumentation, monitoring, data collection, etc.
  • Orient - analytics in all its forms, correlation, visualization, etc.
  • Decide - modeling, scenarios, heuristics, etc.
  • Act - provision, develop, deploy, scale, etc.

Startup analogs:

  • Observe - funnel, feedback, objections, churn, engagement, market intel, competitive intel, etc.
  • Orient - analytics in all its forms, correlation, assigning and extracting meaning from metrics, grasping the market map and territory, etc.
  • Decide -  modeling, scenarios, heuristics, etc.
  • Act - prioritize, kill, build, target, partner, pivot, fundraise, etc.

Those are analogs. It’s worth keeping in mind that OODA was developed for the context of one-to-one-fighter-jet-to-fighter-jet combat and not anything else.