Personal Thoughts on Physical Limits, Complexity, City Planning, and Community


*You can find all articles on Peak Complexity by going to the Complexity Category page at https://cityuntangled.com/category/complexity/.

Reader Warning: Technical discussions involving computers, organizations, and societal collapse

As technologies become more sophisticated and governments become more granular, e.g. more departments, are our lives always getting better? Or is there a peak complexity at which point adding more complexity is not beneficial but harmful? The concept of peak complexity has been in the back of my mind for a long time, and I feel people need to talk about it more. This article is going to get more technical than others as it gets into computer systems and structures of organizations. I will do my best to lay it out for people from various backgrounds.

My brief description of peak complexity: As a society increases its complexity to solve problems, it gets to a peak where the added complexity has limited benefits and maintaining the complexity is neither socially nor physically sustainable. The American anthropologist Joseph Tainter did not come up with the term, but his influential book from 1988, “The Collapse of Complex Societies,” seems to have been the catalyst for the term.

I have a copy of the book and only barely skimmed it, unfortunately. It’s a bit too dense for me. Tainter explores in the book how advanced past civilizations such as the Mayans and the Romans collapsed due to unsustainable complexity. Near the end of the book, he offers a general theory that can be applied to modern civilizations as well. On page 194 of the book, he lists a set of four concepts around collapse:

“Four concepts lead to understanding collapse, the first three of which are the underpinnings of the fourth. These are:

  1. human societies are problem-solving organizations;
  2. sociopolitical systems require energy for their maintenance;
  3. increased complexity carries with it increased costs per capita; and
  4. investment in sociopolitical complexity as a problem-solving response often reaches a point of declining marginal returns.”

“The Collapse of Complex Societies” is still in print. As of December 2024, Thriftbooks.com is one of the few non-conglomerate bookstores with copies for sale.

My understanding of the theory – I might be butchering it – is as follows:

  1. As a society (you can call it civilization) develops, it tends to add more technological and organizational complexity.
  2. The added complexity has both marginal benefits and marginal costs. There are real resource costs such as more materials, energy, human labor, and specialized training.
  3. As the complexity increases, the amount of marginal benefit decreases. In modern terms, the Return-on-Investment (ROI) becomes limited. For example, if you spend twice more on healthcare than you did previously, it does not mean average life expectancy doubles.
  4. At some point, as resources run out or people struggle under the complexity, the society would involuntarily start shedding some of the complexity. This involuntary shedding is collapse.

Tainter has his share of critics, but for me, based on my anecdotal experiences and what I see around the world, his theory does not seem that far off. If you have been watching the news or did air travel back in July of 2024, you might be aware of the global computer outage that caused significant airline delays. The culprit of the outage was not Microsoft as many people initially suspected but CrowdStrike, a computer security company whose software runs on many Microsoft computers. One of their routine updates turned out to have defective programming code that caused MS computers to crash. The estimated loss is $5.4 billion according to one news source. The irony of a computer security company causing a global outage, which they are normally paid to prevent, cannot be overstated.

The story rang a personal note for me because the IT company I worked for experienced a nationwide computer outage. The outage occurred on employee computers due to an internal software update. The root cause was not in the update code, but in the logic of the database (a computer system for storing large rows of tabulated data for storage, retrieval, and editing.) for managing the update process. 

The following is a very simplified summary of the database logic. You might be able to pick out something odd from the steps:

  1. When a new software update is available, the server (the central computer responsible for handling communications with many user-end computers) sends out a unique expiring download link for the update to each employee computer. This is to prevent the code from being leaked to outsiders. The hope is that by the time it gets leaked, it would have expired, so outsiders would not be able to download the update code.
  2. As the server sends out the link, it also creates a row entry in the database with the link and the expiration date and time.
  3. When an employee computer initiates the update download, the server checks the database to make sure that the link has not yet expired. If it has not expired, the computer is allowed to download the update.

You might have noticed something missing in the above steps. What happens to the expired download links? Once it expires, shouldn’t the system delete it? Why would it stay? And that is the core of the issue.

Imagine a scenario where the company has 1,000 employee computers. Every time there is an update, it creates 1,000 unique download links in the database. When it is time for a computer to download the update, it checks up to 1,000 rows for a match. Usually, you will find a match well before checking 1,000 rows, but 1,000 rows is the possible worst case scenario. Now multiply that by 1,000 computers doing the checking. That means that every software update adds 1,000 new rows and causes computers to make up to a total of 1 million database row checks. At some point, after enough update versions, there will be so many database row checks being done at the same time that the system will overload.

And one morning, that is what happened. The database crashed from too many checks from all those employee computers. It took at least half a day of frantic work by programmers to deploy a fix, which was mostly deleting the expired links and rebooting the system. They later introduced a “garbage collection” logic to periodically check the database for expired links and delete them.

When you go to a corporate IT workshop or even a routine weekly meeting, the jargon that managers often throw out is “scalability.” It’s the concept that the system is able to handle increased loads without a linear increase in equipment or staffing. To achieve that, companies often turn to centralizing the system by having a small number of staff to centrally manage the software. The pitfall is that the system is now in the control a few people. And if there is a single screw-up, it spreads through the whole system.

The CrowdStrike global outage and the much smaller national outage I experienced show that in complex systems, flaws in a few central components or processes can bring down the whole system. And when these systematic issues occur, the costs are not just financial or technical. Imagine a family whose vacation was ruined by the CrowdStrike crash – “A Ruined Surprise Trip to Disneyland.” In the next part, I will go more into the real human costs of excess complexity and our brains’ bias toward additive solutions that add more complexity.

Discover more from City Untangled

Subscribe now to keep reading and get access to the full archive.

Continue reading