Introduction

If you’ve ever tried to coordinate a complex, multi-step process across a dozen microservices, you know the pain of distributed state management. Suddenly, you’re writing custom database polling loops, implementing exponential backoffs, and debugging race conditions at 2 AM. When developers ask me why Apache Cadence for distributed workflows is still a go-to solution, I usually point them to its ability to make these complex, distributed failures look like simple, localized code exceptions.

In this FAQ-style guide, I’ve compiled the most common questions I get from engineering teams about adopting Cadence. From fault tolerance to scaling, let’s dive into why this orchestration engine—originally developed by Uber—might be exactly what your backend needs.

1. What makes Cadence different from traditional job queues?

Traditional job queues (like RabbitMQ, SQS, or Celery) are fantastic for async task processing, but they are fundamentally stateless. If a task requires five sequential steps across three different services, and step four fails, a standard queue leaves you responsible for figuring out how to roll back steps one through three, or how to pause and resume later.

Cadence, on the other hand, is a stateful orchestration engine. It persists the entire history of your workflow. If a worker node crashes mid-execution, Cadence simply assigns the workflow to a new worker, replays the event history, and resumes execution exactly where it left off. It treats state as a first-class citizen.

2. Why Apache Cadence for distributed workflows instead of Airflow?

This is arguably the most common question I hear when choosing a workflow orchestration tool. Apache Airflow is brilliant for data engineering pipelines (ETL) running on a schedule (cron). However, Airflow workflows are defined as Directed Acyclic Graphs (DAGs) in Python, and they aren’t built for millions of concurrent, sub-second transactional workflows.

Cadence is built for microservice orchestration. You write your workflows in standard Java or Go code (no DAG YAMLs!). You can use `if/else` statements, `for` loops, and `try/catch` blocks. It is designed to handle business logic flows—like user onboarding, order fulfillment, or payment processing—at massive, real-time scale.

3. How does Cadence handle fault tolerance?

Cadence achieves fault tolerance through its event sourcing architecture. When your workflow code executes, it doesn’t run from start to finish in one memory space. Instead, every time the workflow schedules an activity (like calling an external API), Cadence writes an event to its history database (usually Cassandra or MySQL).

If you are building fault-tolerant microservices, Cadence is a cheat code. You simply specify a `RetryOptions` policy for an activity. If the activity fails due to a network timeout, Cadence automatically waits and retries it based on your exponential backoff settings, without blocking the workflow thread.

4. Is the Cadence Web UI actually useful for debugging?

Absolutely. One of the biggest challenges in distributed systems is observability. When a multi-step process fails, finding out where it failed usually involves grepping through distributed logs in Datadog or Splunk.

The Cadence Web UI gives you a visual dashboard of every workflow execution. You can see the exact input parameters, the output of every completed activity, the current pending activities, and the stack trace of any failures. It drastically reduces the mean time to resolution (MTTR) during an incident.

Cadence Web UI displaying a workflow execution history with completed and pending activities

5. What does writing a Cadence workflow actually look like?

It looks like standard, synchronous code. You don’t have to string together complex callbacks or promises. Here is a simplified example in Java of a workflow orchestrating a user signup process:

public interface UserSignupWorkflow {
    @WorkflowMethod
    void registerUser(String email);
}

public class UserSignupWorkflowImpl implements UserSignupWorkflow {
    private final EmailActivities emailActivities = Workflow.newActivityStub(EmailActivities.class);
    private final BillingActivities billingActivities = Workflow.newActivityStub(BillingActivities.class);

    @Override
    public void registerUser(String email) {
        // These look synchronous, but are actually distributed activities
        emailActivities.sendWelcomeEmail(email);
        billingActivities.createCustomerAccount(email);
    }
}

If `createCustomerAccount` fails, Cadence will retry it automatically based on your config, while the workflow simply “sleeps” safely in the database. For a deeper dive into the code, check out my complete cadence workflow tutorial.

6. Cadence vs. Temporal: Which should I choose today?

It’s impossible to talk about Cadence without mentioning Temporal. Temporal is a fork of Cadence, created by the original Cadence creators. While Temporal has gained massive popularity and offers a managed cloud service, Cadence remains an Apache project heavily maintained and used at scale by companies like Uber, DoorDash, and Instacart.

If you want a managed SaaS solution or are starting fresh with TypeScript/Python, Temporal is usually the better choice today. However, if you have deep expertise in Cassandra/Kafka, operate strictly on-premise, or want to contribute to an open-source Apache foundation project, Cadence remains a highly resilient, battle-tested option.

Final Verdict

In my experience, moving from ad-hoc choreographies (services calling services via HTTP/Kafka) to formal orchestration with Cadence is a paradigm shift. You trade the complexity of distributed state management for the simplicity of sequential code. If your application handles complex, long-running business processes that absolutely cannot be lost due to a server crash, Apache Cadence is a highly proven architecture to base your system on.

Frequently Asked Questions

What is Apache Cadence?

Apache Cadence is an open-source, fault-oblivious, stateful workflow orchestration engine originally developed by Uber. It allows developers to write complex distributed business logic as straightforward, sequential code.

What languages does Apache Cadence support?

Cadence provides official client SDKs for Java and Go. There are also community-supported libraries for Python and Ruby, though Java and Go are the most robust and heavily used in production.

How is Cadence different from Temporal?

Temporal is a fork of Cadence created by its original lead engineers. While they share the same core architecture, Temporal operates as an independent company offering a managed cloud service and expanded language SDKs (like TypeScript), whereas Cadence remains an open-source Apache project heavily used by Uber.

What databases can I use with Cadence?

Cadence requires a persistent datastore to maintain workflow state and event history. It officially supports Apache Cassandra, MySQL, and PostgreSQL.

Can Cadence workflows run for months?

Yes. Cadence is designed for long-running workflows. A workflow can sleep for days, weeks, or even months waiting for an external signal or a timer to fire, consuming minimal resources while asleep.

Do I need a separate cluster to run Cadence?

Yes. Cadence is a distributed system consisting of several core services (Frontend, History, Matching, Worker) and requires a backing database and an indexing engine (like ElasticSearch) for advanced visibility. It is typically deployed on Kubernetes.

Does Cadence replace Kafka?

No. Kafka is an event streaming platform, while Cadence is a workflow orchestrator. In fact, they are often used together. Cadence handles the state and orchestration of a multi-step process, while Kafka is used for asynchronous messaging between the activities that Cadence schedules.

What happens to a Cadence workflow if the worker server crashes?

Because Cadence stores every state change in its event history database, if a worker crashes, the Cadence server simply detects the timeout, assigns the workflow to a healthy worker, replays the event history to restore the state, and resumes execution safely.