The hard question is ``When are two machines equivalent?'' The answer is dependent on the context in which you intend to do the substitution. Intuitively, we strive for some notion of observational equivalence such that I as an external observer (the context) ``cannot tell the difference'' between whether is being used or . If I can't perceive a difference, then for all intents and purposes they are equivalent.
Answering this question is a subject of much debate and decades of research, mostly in the theoretical community on models of concurrent systems. Let me give you an idea of what people might debate about.
At first, you might think it's easy to simply define equivalence in terms of input-output behavior. If I feed the two machines the same input do I get the same output? I ask this question for all possible inputs. If I always get the same output for the same input, they are equivalent; if not, they are different. This notion of equivalence takes a pretty narrow view of what a state machine is. It essentially says that a state machine represents a function and determining equivalence between two state machines boils down to the problem of determining whether they compute the same function.
However, a software system is not nearly as simple. For example, software systems, and hence their state machine models, often have intermediate observable effects on the environment (their context). It does not suffice to just consider the final states of the machines. Thus, many prefer to take the broader view that two machines are equivalent if their behaviors (i.e., the sets of traces) are the same. To determine whether two sets of traces are the same we're going to need a way to determine whether two traces are the same. To determine whether two traces are the same we're going to need a way to determine whether two actions (or states) are the same. To determine whether two actions (or states) are the same we may need to ignore some actions or states variables (e.g., internal ones) and not others. So, you can see that it's not so easy to decide whether two state machines are equivalent; each substructure of a state machine introduces another place for differing opinions.
For example, depending on whether you take an event-based or state-based view of what a trace is you could come up with different answers given two machines. Consider the Light example with off and on states and the flick action. A different Light example with three states, e.g., red, amber, and green, but with the same action set (think of a lever with three positions rather than two) will have the same behavior if you take an event-based view of traces, but different behavior if you take a state-based view.
Even for the same view of traces, there are different notions of equivalence. Suppose in determining whether two behaviors are the same, we need to determine whether two state-based traces are the same. Would you view a trace with ``stuttering'' states as equivalent to one without? Consider the Register example that has read and write actions where after doing two read's in a row, I remain in the same state:
A state-based trace of this execution is:
Is it equivalent to the following?
It may be even harder to answer this question if your model includes infinite traces where a trace might end with an infinite number of stuttering states.