lumænaut

Question of the Week (Cassidoo).
#2: Symlink Resolution

Following the breadcrumbs in a fake filesystem

Color ASCII art: RGB-lit keyboard in outer space with a soft yellow nebula in the background.
Illustration in ASCII art style: an RGB-lit keyboard in outer space with a soft yellow nebula behind it.

Symlinks, cycles, and the art of resolution

A huge thank you to Cassidy Williams and her fantastic Cassidoo newsletter for this week's brain teaser. The problem feels like navigating a labyrinth of shortcuts: given a dictionary that represents a file system (paths as keys, values that are either null for a real file or a string pointing to another path), write a function that resolves a path to its real destination. If a symlink chain forms a cycle, you return null. Simple on the surface, but full of subtle traps.

In this post we will build from the ground up: a naive recursive approach, then an iterative solution with cycle detection, and finally a clean optimal version using a visited set. Along the way we will explore the mathematics of functional graphs, discuss time and space complexity, and produce Python code that you can actually use. And yes, we will give Cassidy the credit she deserves.

The problem statement (as given)

"You are given a file system represented as an object where keys are absolute paths and values are either null (real file/directory) or a string (a symlink pointing to another path). Write a function that resolves a given path to its real destination, following symlinks along the way. If a symlink chain forms a cycle, return null."

Example (in JavaScript object notation, but we'll use Python dicts):


const fs = {
  "/a": "/b",
  "/b": "/c",
  "/c": null,
  "/loop1": "/loop2",
  "/loop2": "/loop1",
  "/real": null,
  "/alias": "/real",
};

resolvePath(fs, "/a");      // "/c"
resolvePath(fs, "/alias");  // "/real"
resolvePath(fs, "/loop1");  // null
resolvePath(fs, "/real");   // "/real"
          

Our job: implement resolve_path(fs, path) in Python.

Rewriting in simpler terms

Imagine a treasure map where each location either is the treasure (null / real file) or contains a note saying "go to this other location". Your task: follow the notes until you either reach the treasure or realize you are walking in circles. If you find the treasure, return its name. If you detect a loop (you come back to a place you have already visited), return None (which is Python's null).

The input is a dictionary (hash map) where keys are strings (paths like "/a") and values are either None (meaning "I am a real file") or another string (meaning "I am a symlink, go there"). The function should be pure: it does not modify the dictionary, it only reads it.

From naive to optimal: three approaches

Approach 1: Recursive follower (the simplest but risky)

The most direct translation: if the current path points to a real file (fs[path] is None), return the path. If it points to another path, call the function again on that target. If the path is not in the dictionary at all (edge case), we could return None or raise an error; but given the problem statement we assume all symlinks point to existing keys.

The obvious problem: recursion depth. If you have a chain of 1000 symlinks, Python's recursion limit (usually 1000) will break. Also, cycles cause infinite recursion unless we detect them.


def resolve_path_recursive(fs, path):
    """
    Naive recursive resolution. Vulnerable to stack overflow and infinite loops.
    """
    if path not in fs:
        return None  # Or raise KeyError, but spec assumes valid keys.
    target = fs[path]
    if target is None:
        return path
    # Recursive step
    return resolve_path_recursive(fs, target)
          

This works for the simple cases but fails on cycles (RecursionError) and deep chains (RecursionError again). Not production ready.

Approach 2: Iterative with cycle detection using a visited set

We can transform recursion into a loop. Keep a set of paths we have already seen in the current resolution chain. If we ever encounter a path that is already in the set, that means we are about to repeat ourselves: a cycle exists. Return None. Otherwise, keep walking until we hit a real file (null) or a missing entry.


def resolve_path_iterative(fs, path):
    """
    Iterative resolution with cycle detection using a visited set.
    Returns the real path or None if a cycle is detected.
    """
    seen = set()
    current = path
    while current in fs:
        if current in seen:
            # Cycle detected
            return None
        seen.add(current)
        target = fs[current]
        if target is None:
            # Real file / directory
            return current
        # Otherwise follow symlink
        current = target
    # If we exit the loop because current not in fs, that's an error condition.
    # According to problem spec, we assume all links are valid, but defensively:
    return None
          

This is already much better. No recursion limit, explicit cycle detection, linear time in the length of the symlink chain. But there is a nuance: the problem expects that if the starting path is a real file (points to null), we return that same path. Our code does that.

Approach 3: Optimal with early exit and path tracking (the final form)

The previous solution is already optimal in terms of time complexity: O(L) where L is the length of the chain (or number of unique nodes visited before a cycle). However we can make a small improvement: instead of storing a generic set of visited nodes, we can also store the order of traversal, but we don't need that. The visited set is enough. The code above is clean and fast.

But wait: there is a subtle bug in some edge cases. What if the dictionary contains a key that maps to a string that does not exist as a key? According to the problem statement we assume the input is well-formed, but robust code might handle that by returning None. Also, what about self loops? "/self": "/self" would be caught by the cycle detection immediately.

Let's produce the final, polished version with comments and error handling.


def resolve_path(fs, path):
    """
    Resolves a symlink path to its real destination.
    
    Parameters:
    fs (dict): A dictionary mapping absolute paths to either None (real file)
               or a string (symlink target).
    path (str): The starting path to resolve.
    
    Returns:
    str or None: The real path if found, None if a cycle is detected or the
                 path leads to an invalid target.
                 
    Example:
    >>> fs = {"/a": "/b", "/b": None, "/loop": "/loop"}
    >>> resolve_path(fs, "/a")
    '/b'
    >>> resolve_path(fs, "/loop")
    None
    """
    if path not in fs:
        # Path not in filesystem: treat as missing, could also raise.
        return None
    
    visited = set()
    current = path
    
    while current in fs:
        if current in visited:
            # Cycle detected
            return None
        visited.add(current)
        target = fs[current]
        if target is None:
            # Reached a real file
            return current
        # Follow symlink
        current = target
    
    # If we exit because current is not in fs, it's a broken link.
    return None
          

This is the implementation we will ship. It is concise, efficient, and handles cycles gracefully.

The mathematical model: functional graphs

The filesystem dictionary can be viewed as a partial function f: P → P ∪ {⊥}, where P is the set of path strings and ⊥ represents a real file (null). For each path p that is a symlink, f(p) = q where q is another path. For real files, f(p) = ⊥.

Starting from an initial path p0, we generate a sequence: p0, p1, p2, ... where p_{i+1} = f(p_i) as long as f(p_i) is a string (not ⊥). This sequence either:

  • Terminates at a real file: we encounter some p_k such that f(p_k) = ⊥, and we return p_k.
  • Enters a cycle: there exist i < j such that p_i = p_j, and for all k ≥ i, the sequence repeats. Because f is deterministic, once we revisit a node we are in an infinite loop. We detect this and return None.
  • Leads to a missing key: if some f(p_i) is a string that is not in the domain of f, the chain breaks. Defensively we return None.

This is exactly the structure of a functional graph (a directed graph where each node has out-degree at most 1). Cycle detection in functional graphs is a classic computer science problem. The visited set approach is essentially a "mark and sweep" method: we mark nodes as we traverse, and if we encounter a marked node, we have a cycle.

An alternative mathematical approach is Floyd's cycle-finding algorithm (tortoise and hare), which uses O(1) extra memory instead of O(L). But given that symlink chains are unlikely to be astronomically long, the visited set is simpler and more readable. For completeness, here is how you might implement Floyd's algorithm:


def resolve_path_floyd(fs, path):
    """
    Cycle detection using Floyd's tortoise and hare algorithm.
    Uses O(1) extra space but two pointers.
    """
    def next_node(p):
        # returns the next path or None if real file or missing
        if p not in fs:
            return None
        val = fs[p]
        if val is None:
            return None
        return val
    
    tortoise = path
    hare = path
    
    # Phase 1: detect cycle
    while True:
        tortoise = next_node(tortoise)
        hare = next_node(next_node(hare)) if hare is not None else None
        if hare is None or tortoise is None:
            # No cycle, we reached a real file or broken link
            break
        if tortoise == hare:
            # Cycle detected
            return None
    
    # Phase 2: find the real destination without cycle
    # Actually if no cycle, we just follow until None
    current = path
    visited = set()  # we still need visited to avoid infinite loops? No, because we already know no cycle.
    # But careful: Floyd only tells us there is a cycle, not where it starts if we want the real file.
    # For our purpose, simpler to reuse the iterative visited set. Floyd is overkill.
    # So we stick with the previous version.
    return resolve_path_iterative(fs, path)  # fallback
          

Given the clarity of the visited set, we will keep that as our canonical solution.

Complexity analysis

Let L be the number of distinct nodes visited during resolution (the length of the chain until termination or cycle detection). In the worst case, L could be up to the total number of symlinks in the dictionary (say N). The algorithm:

  • Time complexity: O(L) because each step does constant-time dictionary lookups and set insertions/checks.
  • Space complexity: O(L) for the visited set. In the worst case, L = N, so O(N) additional memory.
If memory is a concern (e.g., billions of symlinks), the Floyd variant would reduce space to O(1), but at the cost of readability. For interview contexts, the visited set is perfectly acceptable and expected.

Testing the solution (with provided examples)

Let's write some test cases to ensure correctness.


def test_resolve_path():
    fs = {
        "/a": "/b",
        "/b": "/c",
        "/c": None,
        "/loop1": "/loop2",
        "/loop2": "/loop1",
        "/real": None,
        "/alias": "/real",
        "/self": "/self",      # self-loop
        "/broken": "/missing", # broken link
    }
    
    assert resolve_path(fs, "/a") == "/c"
    assert resolve_path(fs, "/alias") == "/real"
    assert resolve_path(fs, "/loop1") is None
    assert resolve_path(fs, "/real") == "/real"
    assert resolve_path(fs, "/self") is None
    assert resolve_path(fs, "/broken") is None  # missing target
    assert resolve_path(fs, "/nonexistent") is None
    
    print("All tests passed!")

if __name__ == "__main__":
    test_resolve_path()
          

Running this should output "All tests passed!"

Edge cases and pitfalls

  • Empty dictionary: If fs is empty, any path resolves to None (since path not in fs). That's consistent.
  • Path points to None directly: Should return the path itself (real file). Our code does that because target is None and we return current.
  • Cycle of length 1: A key that maps to itself. Detected immediately when we see current already in visited set.
  • Chain that ends in a missing key: We return None, which is a sensible default. The problem statement didn't specify this scenario, but robust code handles it.
  • Very long chains: No recursion, so stack overflow is impossible.

Why this question is interesting

Cassidy's question tests several fundamental concepts: recursion vs iteration, cycle detection, dictionary lookups, and edge case thinking. It mirrors real-world symlink resolution in operating systems (though those are more complex with permissions and mount points). In interviews, it's a great way to assess whether a candidate reaches for recursion without considering stack limits, and whether they remember to detect cycles.

Moreover, it's a perfect example of a graph traversal problem disguised as a filesystem problem. Once you see the functional graph, the solution becomes clear.

Alternative approaches from the community

I've seen some developers implement this using a while loop with a counter (e.g., limit to 1000 steps) as a poor man's cycle detection. That's fragile. Others use a list to record the order of visitation and check if current in path_list which is O(L) per check, making the whole algorithm O(L^2). The visited set gives O(1) checks.

There is also a two-pointer technique (tortoise and hare) that uses O(1) space. While elegant, it's harder to explain in an interview unless you are prepared. The visited set is the sweet spot.

Putting it all together: the final answer

Here is the complete, ready-to-use Python function with docstring and comments.


def resolve_path(fs, path):
    """
    Resolves a symlink path to its real destination.
    
    The function follows symlinks until it either reaches a real file
    (value is None) or detects a cycle. If a cycle is found, returns None.
    
    Args:
        fs (dict): A dictionary mapping string paths to either None (real file)
                   or a string (symlink target).
        path (str): The starting path to resolve.
    
    Returns:
        str or None: The resolved real path, or None if a cycle is detected
                     or the path leads to an invalid target.
    
    Time complexity: O(L) where L is the length of the symlink chain.
    Space complexity: O(L) for the visited set.
    """
    if path not in fs:
        return None  # path does not exist in the filesystem
    
    visited = set()
    current = path
    
    while current in fs:
        if current in visited:
            # We've been here before -> cycle
            return None
        visited.add(current)
        target = fs[current]
        
        if target is None:
            # Reached a real file/directory
            return current
        # Otherwise, follow the symlink
        current = target
    
    # If we exit the loop because current is not a key in fs,
    # the symlink points to a non-existent path.
    return None
          

Conclusion

We have journeyed from a naive recursive solution through an iterative cycle-aware version to a clean, optimal implementation. The key insights: treat the filesystem as a functional graph, use a set to record visited nodes, and always prefer iteration over recursion for unbounded chains. Cassidy's question reminds us that even seemingly simple problems can hide complexity, and that a methodical approach yields robust code.

Thank you again to Cassidy Williams for the inspiration. If you enjoyed this breakdown, check out her newsletter for more weekly challenges. Happy coding, and may your symlinks never form cycles!

Question of the Week #2 — Cassidoo. Symlink resolution.

References & further reading

Can't get enough from lumænaut? Read this...