Core Idea
“The network is reliable” is the first and most critical of the Fallacies of Distributed Computing—the false assumption that network calls between services will always succeed. In reality, networks fail constantly due to hardware faults, misconfigurations, packet loss, timeouts, and infrastructure issues, making network unreliability the foundational challenge of distributed architecture.
What Is the “Network Is Reliable” Fallacy?
The “network is reliable” fallacy is the assumption that when a service makes a network call to another service, that call will always complete successfully:
- This assumption is natural when working within a single process (where method calls virtually never fail)
- But it becomes catastrophically wrong when applied to distributed systems where network boundaries exist
In monolithic applications:
- When one module calls another module’s function, the call either succeeds (returns a result) or fails immediately with a clear exception
- The failure modes are limited: out of memory, null pointer, logic error—all deterministic and usually reproducible
In distributed systems, a network call introduces an entirely new category of failures:
- Transient network errors
- Service unavailability
- Timeout ambiguity (did the request succeed but the response was lost?)
- Partial failures
- Cascading failures across multiple services
Network failures happen constantly in production systems:
- A hardware switch can fail
- A fiber optic cable can be accidentally cut during construction
- A misconfigured firewall rule can silently drop packets
- A congested network can delay packets past timeout thresholds
- A DNS lookup can fail
- A load balancer can route traffic to a dead instance
- Cloud provider infrastructure can experience outages
- Each of these scenarios breaks the “network is reliable” assumption
The fallacy is particularly insidious:
- Networks often appear reliable during development and testing—local networks are fast and stable, test environments have minimal load, and latency is low
- It’s only in production, under real-world conditions with scale, geographic distribution, and operational complexity, that network unreliability becomes apparent
- This leads to systems that work perfectly in testing but fail unpredictably in production
Addressing this fallacy requires explicit architectural patterns:
- Retry logic handles transient failures
- Circuit breakers prevent cascading failures by stopping calls to failing services
- Timeouts prevent indefinite blocking
- Idempotency ensures retries don’t cause duplicate operations
- Health checks detect failed instances
- These patterns add complexity, operational overhead, and development effort—costs that architects must accept when choosing distributed architectures
Why This Matters
This fallacy is foundational because it affects every other aspect of distributed system design:
- If you assume the network is reliable, you won’t build in retry logic, and your system will fail when transient network issues occur
- You won’t implement circuit breakers, so one failing service will cascade failures across your entire system
- You won’t design for idempotency, so retries will corrupt data by creating duplicate orders or duplicate charges
Understanding this fallacy forces architects to make honest trade-off decisions:
- The benefits of distributed architectures—independent deployability, scalability, fault isolation
- Come at the cost of handling network unreliability
- If you cannot justify the effort and complexity of building resilient distributed systems, you should reconsider whether distribution is appropriate for your use case
This is why the decision to move from monolith to microservices is not a matter of following trends:
- It’s a deliberate decision to trade monolithic simplicity for distributed scalability
- While accepting the engineering burden of handling network unreliability through:
- Retry policies
- Circuit breakers
- Timeouts
- Distributed tracing
- Comprehensive monitoring
Related Concepts
- Fallacies-of-Distributed-Computing — The complete set of eight fallacies this belongs to
- Monolithic-vs-Distributed-Architectures — The architectural decision this fallacy most directly impacts
- Fallacy-Latency-Is-Zero — Related fallacy about network performance assumptions
- Architecture-Characteristics-Categories — Reliability and availability characteristics affected by this fallacy
- Trade-Offs-and-Least-Worst-Architecture — This fallacy exemplifies why distributed architectures involve trade-offs
- Client-Server-Architecture — The simplest distributed pattern where this fallacy first appears
Sources
-
Richards, Mark and Neal Ford (2020). Fundamentals of Software Architecture: An Engineering Approach. O’Reilly Media. ISBN: 978-1-492-04345-4.
- Chapter 9: Foundations
- Discusses the Fallacies of Distributed Computing and their architectural implications
- Available: https://www.oreilly.com/library/view/fundamentals-of-software/9781492043447/
-
Deutsch, Peter (1994-1997). “The Eight Fallacies of Distributed Computing.” Originally articulated at Sun Microsystems.
- First fallacy in the original list
- Identified through observing repeated distributed system failures in production
- Widely referenced in distributed systems literature
Note
This content was drafted with assistance from AI tools for research, organization, and initial content generation. All final content has been reviewed, fact-checked, and edited by the author to ensure accuracy and alignment with the author’s intentions and perspective.