What what what?
Seeing that this situation has not improved at all since around 2010 when I first looked into it, I thought it would be worthwhile to write a note on an important step so often missed by vendors when implementing (IP) multicast PTP (Default profile and Enterprise profile).
I have worked with multiple complete hardware PTP implementations (appliances) that are multicast-capable: GMs, slaves, and most importantly in this context - probes, analysers and other test kit. All but very few supported sending IGMP joins or replied to IGMP querries. In my book, a multicast-capable IP stack with no IGMP support is incomplete. You all do support ICMP, do you not - or did I lose you already?
Even the fact itself that I have to explain this, should already be embarrassing to those vendors. In multiple conversations, vendor representatives otherwise well-versed in all things timing, were not quite sure what the question was. When asked if they supported sending IGMP joins or replying to IGMP queries, they would in turn ask me to spell it (as if it were some rare animal), write it down and say they would get back to me. OK, I can understand that you're a frequency guy who transitioned into the world of IP, but it's an IP world we live in, so please do your homework!
Slave or master, boundary clock or not, a PTP node operating in an IP multicast profile (Default or Enterprise), must be able to receive PTP messages. This is pretty clear I think. If it's not, then I suggest a career change: growing turnips or weaving baskets are both healthy alternatives, and well suited for the Apocalypse at that.
Why do we need IGMP?
With all clock nodes connected to a (dumb) flat Layer 2 topology, there is no issue. Multicast is effectively flooded to every member of the VLAN. Everything works and every node is happy.
Now try routing multicast between VLANs, or even better, between locations, as in with Protocol Independent Multicast. So your GMs live on different devices, on different subnets, and so do your slaves. Something has to inform PIM that somebody wants to receive data on 188.8.131.52, and it all starts from the local PIM Designated Router that will initiate upstream PIM joins. That something is an IGMP querrier, periodically asking if any station on the local subnet wants to receive any multicast, and recording the replies, provoked or not. The station (this is your device!) needs to reply to those queries and send IGMP joins when it wishes to receive multicast. Without that, multicast no worky.
As you can see, for multicast routing, IGMP turns out to be rather necessary. But what about plain switching? Surprisingly (not), it is the same case with most modern equipment. Let me introduce you to IGMP snooping. There are many multicast-heavy environments, and the two main use cases I can think of are audio and video broadcasting, and electronic trading systems. At least in electronic trading, it is sufficient to say that if you're pushing multicast blindly to all VLAN members, you are wasting resources. And so, IGMP snooping was introduced to provide an automated means of controlling who receives what within one VLAN. This is done by inspecting IGMP reports arriving on every VLAN member port, and restricting multicast replication for that port only to groups that something connected to this port has requested. Here's the catch: on a majority of today's managed switches, especially those classed as enterprise switches, you will find IGMP snooping enabled by default. Meaning: your device will not receive PTP messages at all, unless you run IGMP.
...and how is that a problem exactly?
I'm not quite sure what the root of the problem is - perhaps the fact that IGMP snooping is considered an "enterprise" feature, combined with the fact that most time sync products came from the Telecom world. If you ask me, IGMP snooping is nothing less than standard network engineering practice. As to multicast routing and PIM and the underlying requirement for IGMP: are you trying to sell me a car that only drives on local roads?
What should you do?
Just make sure your product at least supports IGMPv1 or IGMPv2. It's not impossible. Just do it.
I'm aware that many vendors use OEM boards for PTP - but that only narrows down the number of responsible parties. Some vendors will say "But it's the hardware that does it! My hardware only does PTP!". I say bovine excrement. Your hardware still runs some embedded OS. Please show me a contemporary embedded IP stack with no IGMP support. Even if you prove that your IP stack can't do it, you can still do it yourself. Yes you can. It's called an unsolicited IGMP report. The least you can do is craft an IGMP packet and just transmit it periodically - and make the interval configurable while you're at it. At least you will get your multicast, and avoid embarrassment.
Get it up an keep it up
One thing to remember is that IGMP support has to be continuous. If your product is to be robust to network failures, you need to ensure that you will re-send the IGMP joins when in LISTENING state, or at least after recovering from a network failure. But there's more. A grandmaster of
ClockClass < 128 that was in PASSIVE state, will transition to MASTER, not LISTENING when it stops receiving PTP Announce messages. So as far as the protocol itself goes, it has no means of detecting that it should re-join the multicast group. PTPd for example, does it periodically when in master state.
When a link goes down, its list of groups is removed from the IGMP snooping cache on the switch - so the receiver should re-join when it recovers. If your device replies to IGMP querries, then it will reply on next query, but the default query interval is typically at least 60 seconds - so it's best to join ourselves rather than wait for a query. On top of that, if you run IGMPv2, you may also wish to leave the multicast group on protocol shutdown.
We can work around this - but we won't
When you approach a network engineering team with your multicast device that does not support IGMP, you will be met with slightly puzzled facial expressions and plenty of eyebrow action. Can we work around this? We can, by using static IGMP joins and disabling IGMP snooping if the situation warrants it. But we will do so very reluctantly. We have existing architectures and designs in place, we have procedures and policies, and we are not going to re-engineer our network for your broken implementation to work. It's as simple as that. Fix it. We've had enough.