Local Fault and Remote Fault in Ethernet Standards
A common issue network operators troubleshoot are down or intermittently flapping (meaning, going down and coming back up erratically) links. The root cause of these issues tend to be simple in nature, such as a failing optic/transceiver or bad medium (like a damaged fiber or copper cable). However, identifying the failing component can be a challenge due to a lack of visibility into the issue, resulting in “shotgun troubleshooting” where multiple components are needlessly replaced in an attempt to resolve the issue.
The IEEE 802.3 Ethernet standard has a fault messaging mechanism that offers more visibility into physical layer issues and can aid troubleshooting. These messages are called Local Fault and Remote Fault.
Let’s dig into these faults and understand what they mean!
TL;DR
To very briefly summarize:
- A Local Fault means that the local device stopped receiving valid data or signal from the remote device. The local device then sends a Remote Fault message across the link.
- A Remote Fault is generated by the remote device indicating that the remote device stopped receiving valid data or signal from the local device.
- Neither of these two faults assign conclusive blame to either device. They only offer insight to the nature of the link’s failure.
- For example, although a Local Fault indicates the local device is not receiving valid data/signal from the remote device, the root cause of the fault could be an issue with the local device’s transceiver or a malfunction with the local device’s PHY layer.
Topology
Consider the following topology, where we have two Ethernet switches named Switch-1 and Switch-2. Switch-1’s Ethernet1/1 interface connects to Switch-2’s Ethernet1/2 interface. We can see data plane traffic is bidirectionally flowing between both switches.
If we were to zoom into the MAC (Media Access Control) and PHY (physical layer) components of the link between the two switches, we would see the following:
Note: This topology is largely adapted from a similar image in a presentation by Chiwu Ding, Zeng Li, and WB Jiang from the IEEE 802.3ba Task Force on PHY layer monitoring of 100 Gigabit Ethernet links. The original presentation can be found here.
In this topology, we can see that the forwarding engine (typically an ASIC - Application Specific Integrated Circuit) connects to each interface. Each interface consists of:
- MAC (Media Access Control): Responsible for implement CSMA/CD, Ethernet frame error handling (like FCS checking), and Ethernet frame transmission.
- These responsibilities are not all-inclusive, and some of these responsibilities may be handled by the forwarding engine or the LLC (Logical Link Sublayer) depending on how Ethernet is implemented on the device.
- RS (Reconciliation Sublayer): Responsible for translating signals between the MAC and PHY layers sent/received through the MII (Media Independent Interface, which is the “link” between the two in the above topology). The RS also generates Remote Fault messages and handles incoming Local Fault messages from the PHY layer.
- PHY (Physical Layer): Responsible for converting the MAC layer’s frames into electrical or optical signals that can be transmitted across the physical medium (like copper or fiber).
- PCS (Physical Coding Sublayer): Responsible for data encoding/decoding (such as 8b/10b encoding used by Gigabit Ethernet, 64b/66b encoding used by 10 Gigabit Ethernet’s 10GBASE-R, etc.), scrambling/descrambling (which is not necessarily to obfuscate data), and so on.
- PMA (Physical Medium Attachment): Responsible for framing, octet synchronization/detection, and additional scrambling/descrambling.
- PMD (Physical Medium Dependent): Turns digital signals into analog signals for transmission across the medium, such as copper or fiber. Practically, the PMD is most often implemented as a transceiver/optic (such as SFP, SFP+, QSFP, etc. transceivers).
Note: I am by no means an electrical engineer, FPGA developer, or ASIC designer, so my understanding of these layers is likely not as deep as someone who works with these layers on a daily basis. If you are such a person and see any inaccuracies in my descriptions, please let me know!
In this topology, we can see that data plane traffic is flowing between the two switches across two unidirectional channels.
Note: Depending on the specific Ethernet standard and physical medium used, there may be multiple pairs of channels (such as 8 25Gbps lanes for some 100 Gigabit Ethernet standards, 8 250Mbps lanes for 1000BASE-T), but for simplicity, we’ll consider a single pair of channels.
Local Fault
Let’s say that the medium between both switches is damaged, causing the unidirectional channel for data from Switch-2’s Ethernet1/2 to Switch-1’s Ethernet1/1 to fail.
The PHY layer on Switch-1’s Ethernet1/1 interface will detect that it is no longer receiving valid data from the remote device (Switch-2). The PHY layer will then generate a Local Fault message and send it towards the MAC layer to the RS.
The MAC layer will interpret this message and take appropriate action, such as informing the device’s forwarding engine (and ultimately, the network operating system) that the link is effectively down.
Note: There is usually other logic, such as starting a debounce timer, that the MAC layer or forwarding engine will perform when a Local Fault is detected. This logic tends to be implementation-specific and is not standardized across all devices, so we won’t cover it here.
Remote Fault
At this point, Switch-1’s Ethernet1/1 interface can no longer receive data on this link. However, it may still be able to transmit data across the link, as the PHY layer’s transmitter may still be functional. It’s imperative that Switch-1 find a way to inform Switch-2 that it is no longer receiving valid data from Switch-2 so that Switch-2 does not recklessly continue to send data across the link that will never be received.
This is where the Remote Fault message comes into play. The RS for Switch-1’s Ethernet1/1 interface will generate a Remote Fault message and send it across the link to Switch-2’s Ethernet1/2 interface.
When Switch-2’s Ethernet1/2 interface receives the Remote Fault message, it will know that Switch-1 is no longer receiving valid data from Switch-2, indicating that the link is unhealthy. At this point, Switch-2 will generally declare the link/interface as “down”, allowing higher-level protocols to take appropriate action (such as routing protocols reconverging, etc.).
Note: Similarly to when a Local Fault is detected, there is usually other logic, such as starting a debounce timer, that the MAC layer or forwarding engine will perform when a Remote Fault is detected. This logic tends to be implementation-specific and is not standardized across all devices, so we won’t cover it here.
Something interesting to note is that Switch-2’s Ethernet1/2 interface will not completely stop transmitting all data out of Ethernet1/2 after receiving a Remote Fault message. Instead, it will continuously generate IDLE symbols (which are essentially “no data” or “line not busy” messages) across the link. This way, if/when the issue with the unidirectional path from Switch-2’s Ethernet1/2 interface to Switch-1’s Ethernet1/1 interface is resolved, the link can quickly return to a healthy state.
Note: When you think about it, thanks to IDLE signals, a link between two devices is always running at maximum capacity, even if no “meaningful” data is being transmitted. Put another way, if you are sending 1Gbps of traffic across a 10Gbps link, the link is still running at 10Gbps, but 9Gbps of that is IDLE symbols. The energy-conscious among us may find this a bit wasteful, which is why Energy-Efficient Ethernet (EEE) was introduced to reduce power consumption during periods of low traffic.
Thanks to the magic of oscilloscopes and the skill of those who wield them, an excellent visualization of what an Ethernet link looks like with IDLE codes (both with and without EEE enabled) can be found in Mete Balci’s blog post here.
Conclusion
Ethernet’s Local Fault and Remote Fault messages are simple yet informative tools for troubleshooting physical layer issues in Ethernet networks. They are typically exposed to network operators through the device’s system logs or command line output. An example from Cisco’s IOS-XR network operating system is below, where interface TenGigabitEthernet0/0/0/15 received a Remote Fault message from the remote device:
1
2
3
4
5
6
RP/0/RP0/CPU0:ios#show controllers TenGigE0/0/0/15
Operational data for interface TenGigE0/0/0/15:
State:
Administrative state: enabled
Operational state: Down (Reason: Remote Fault)
LED state: Yellow On
These messages provide insight into the nature of the link’s failure, allowing network operators to more quickly identify the root cause of the issue and resolve it.