how to write a dag
a directed acyclic graph is a dict.
dag = {
"ingest": [("validate", transform)],
"validate": [("enrich", enrich), ("notify", notify)],
"enrich": [("publish", publish)],
"notify": [(None, alert)],
"publish": [(None, None)],
}
command maps to a list of (next_command, handler). if next_command is None, it’s terminal. walk it recursively. you’re done.
def walk(dag, cmd):
for next_cmd, fn in dag.get(cmd, []):
if fn:
fn()
if next_cmd:
walk(dag, next_cmd)
that’s it. that’s the post.
no it isn’t
oh yeah? let’s do some leetcode.
797. all paths from source to target. they give you an adjacency list and ask you to walk every path from node 0 to the last node. it is the exact same structure.
def allPathsSourceTarget(graph):
res = []
def dfs(node, path):
if node == len(graph) - 1:
res.append(path[:])
return
for nei in graph[node]:
dfs(nei, path + [nei])
dfs(0, [0])
return res
210. course schedule ii. topological sort. if your dag has dependencies you need ordering. this is kahn’s algorithm, which is just “repeatedly pull nodes with no incoming edges.”
def findOrder(numCourses, prerequisites):
graph = defaultdict(list)
indegree = [0] * numCourses
for a, b in prerequisites:
graph[b].append(a)
indegree[a] += 1
queue = deque(i for i in range(numCourses) if indegree[i] == 0)
order = []
while queue:
node = queue.popleft()
order.append(node)
for nei in graph[node]:
indegree[nei] -= 1
if indegree[nei] == 0:
queue.append(nei)
return order if len(order) == numCourses else []
207. course schedule. same thing but you just need cycle detection. if your topo sort doesn’t visit every node, you have a cycle. the solution above already handles it. check the length.
three problems solved with the same data structure. an adjacency list. which is a dict. i keep seeing teams reach for temporal, airflow, finite state machines, prefect, dagster. the whole zoo. for problems that are fundamentally a recursion of: “run thing A, then run thing B.”
the stuff you practiced for six months to get hired. the stuff you swore you’d never need on the job. you need it on the job.
we need orchestration.
the failure mode is always the same. someone introduces the orchestrator. it works. then someone needs to debug a failed workflow at 2am and they’re reading temporal’s event sourcing history format through a UI that was clearly designed by someone who hates sleep. or they’re staring at an airflow worker that silently ate a task because the executor pool was misconfigured three deploys ago. the complexity of the orchestrator becomes the problem the orchestrator was supposed to solve.
you need a queue.
where they’re actually useful
i’ll be fair. there are genuine cases:
combinatorial fan-out. when your DAG shape isn’t known until runtime. you’re generating N tasks based on data and need to track completion of all of them before proceeding. dynamic DAGs are painful to hand-roll on a queue. airflow’s dynamic task mapping or temporal’s child workflows genuinely help here.
checkpoint and resume over long-running workflows. if a single workflow takes hours and you need to resume from step 47 of 200 after a deploy, temporal’s replay model is exactly right. rebuilding that yourself on SQS is just writing a worse version of temporal.
regulated audit trails. if compliance requires you to prove exactly what ran, when, with what inputs, and what the result was. the orchestrator’s event log is the audit trail. building that into every queue consumer is a lot of undifferentiated work.
those are real. but they’re also rare. most of the time you’re not doing any of that. most of the time you’re chaining three services together and the dict would have been fine.
start your interview with the naive solution
the adjacency list is the fundamental unit of workflow. everything else is abstraction over it. sometimes the abstraction earns its keep. usually it doesn’t.
before you pip install airflow and bring 47 transitive dependencies and the additional complexity of managing extra infrastructure into your life, and onto your networking stack, and onto your CI, try the dict. walk it. put it on a queue. go home early.
pick the boring thing. ship the work.