The PTP guy

All things time sync and the documentation that never was. @ThePTPGuy

Watching the leap second... with tshark

| Comments

Why are we all here?

Do not believe anyone spreading apocalyptic news about things crashing and burning because of the leap second. Just like death from a gunshot wound is not death from lead poisoning, things crash because people do not test their systems enough, not because we insert an extra second every few years. Having tested the leap second event ad nauseam for the past couple of months, using GPS simulators and a range of timing hardware along with pure software setups, I feel both relieved and disappointed at the same time, now that the day of he leap second has come. Weeks of preparations on hardware and software side, hard and sometimes frustrating work with vendors and coding until small hours, all conclude today, and then we will likely forget about it for another year or two, or more - that's if it doesn't get abolished later this year. Everything is ready and leap second pending warnings are flying around. Why is it all always last minute? Hard to say - but mostly because vendors don't do their job right so everything has to be verified twice by the customer. It's not even that the cake is a lie. Everything is.

So much for a rant, a now for something completely different; no. 1: the leap second.

Introducing tshark

Application health aside, bulk of my work around the leap second was ensuring that various parts of a timing system issue the leap second warnings on the day of the event, giving all dependent parts enough notice - which is crucial for NTP. Most of this work is pure console work - at least if it is to be efficient. Most of you IT / networking people know and use Wireshark. We all know it's great - decodes pretty much anything you throw at it and comes bundled with an abundance of tools to assist with network and application troubleshooting. And yet, so many people I know, by default in the console use tcpdump, just to record traffic, then transfer the file back to their GUI and look at it with Wireshark. How do I put this - well, sometimes you just cannot afford to waste this much time if you need to be agile and inspect things repeatedly. Raise your hand if you knew that wireshark comes with a terminal version. Raise a pitchfork if you use it on a daily basis. OK - tcpdump or snoop are often bundled with your operating system's base package set - but then again, disk space is less precious these days, so it doesn't cost much to install tshark.

Tshark gives you protocol decode superpowers of Wireshark, combined with the familiar BPF / pcap filter syntax (plus protocol-based display filters), and the ability to extract protocol fields and display them in our chosen order and format.

The syntax generally is:

tshark [-np] -i <interface> -R <display.filter> [bpf filter expression] -T <output format> [-E <option=value>] [-E ... ] [-e <field1> -e <field2> -e ...]

where:

-n - do not resovle names - can speed things up
-p - do not go into promiscuous mode (unless you have to - and you do for multicast)
-R - wireshark's display filter (ip.src == xxx etc.)
-T - output format: -Tfields, -Ttext, etc. Here we'll use -Tfields
-E - output options: header=y/n, separator=/t,/s,character
-e - field or fields (when used multiple times) - same selectors as for the display filter

Watching PTP leap second with tshark

As described in this post, all the information related to the leap second is contained in PTP Announce messages. The interesting fields are as follows:


PTP Packet field: PTP dataset property:
header.messageType 
header.flagField.leap61  timePropertiesDS.leap61
announce.currentUtcOffset  timePropertiesDS.currentUtcOffset

 

Wireshark field:
ptp.v2.messageid == 0x0b (Announce)
ptp.v2.flags.li61
ptp.v2.an.origincurrentutcoffset


This is all we need as far as fields go. The interesting traffic is PTP: UDP port 319 and UDP port 320 - Announce messages belong to General message class so they will use port 320.

Our tshark expression becomes:

tshark -ni xxx udp port 320 -R ptp.v2.messageid==0x0b -Tfields -Eheader=y\
-e frame.time \
-e ptp.v2.an.origincurrentutcoffset \
-e ptp.v2.flags.li61

The output will be something like this:

frame.time ptp.v2.an.origincurrentutcoffset    ptp.v2.flags.li61
Jun 30, 2015 11:27:54.296684000 35  1
Jun 30, 2015 11:27:55.299970000 35  1
Jun 30, 2015 11:27:56.295142000 35  1
Jun 30, 2015 11:27:57.300755000 35  1
Jun 30, 2015 11:27:58.292606000 35  1
Jun 30, 2015 11:27:59.294318000 35  1
Jun 30, 2015 11:28:00.302363000 35  1
Jun 30, 2015 11:28:01.310549000 35  1
Jun 30, 2015 11:28:02.310241000 35  1
Jun 30, 2015 11:28:03.309462000 35  1
Jun 30, 2015 11:28:04.412108000 35  1
Jun 30, 2015 11:28:04.426905000 35  1

This is one way to record the leap second - not all operating systems will display the :60 value for seconds.

What do you see when the actual leap second happens? UTC offset changes from 35 to 36 (this time), and the leap flag is immediately reset to zero. This should normally happen just around midnight. The first Announce message after midnight should carry the new offset and the flag should be reset. If this happens too early and / or too late - blame your vendor. To ensure there is no way Sync messages will cause unstable offset values around the leap second event, PTPd for example completely suspends sending and receiving of PTP event messages (the ones that have to be timestamp), and only resumes it after the first announce after the suspension period was received. The standard states that the flags and values may change at ± two Announce message intervals around midnight - with PTPd the pause period around midnight is configurable and defaults to five seconds or two Announce message intervals before and after midnight - whichever is longer. Now, if your PTP grandmaster still isn't sending out the leap61 flag as 1 - you have a problem and it's too late to fix it on the master side - in that case with both NTPd and PTPd you can use the leap seconds file.

Watching NTP leap second with tshark

With NTP you need two fields from a server mode (mode == 0x04) packet (server's reply): LI (Leap Indicator) and one of server's timestamps - for example the transmit timestamp (xmt).

The tshark expression will look like this:

tshark -ni eth0 port 123 -R ntp.flags.mode==4 -Eheader=y -Tfields \
-e frame.time \
-e ntp.flags.li \
-e ntp.xmt

The output will look like this:

frame.time ntp.flags.li    ntp.xmt
Jun 30, 2015 14:48:01.772791000 1   d9:3d:1c:91:c6:04:86:7b
Jun 30, 2015 14:48:19.772441000 1   d9:3d:1c:a3:c5:e8:b2:2d
Jun 30, 2015 14:48:34.772810000 1   d9:3d:1c:b2:c5:fa:f6:4f
Jun 30, 2015 14:48:51.772300000 1   d9:3d:1c:c3:c5:d5:7d:c4
Jun 30, 2015 14:49:09.772914000 1   d9:3d:1c:d5:c5:fb:a2:93

The leap indicator LI as per NTP version 4 - RFC 5905 will be 0x01 for a positive leap second. Unfortunately with Wireshark up to version 1.5, NTP timestamps will be displayed in hex - 1.5+ will give you a regular timestamp. Upper four octets are the seconds counter - you can see a poll interval of 16 seconds here. So what happens around midnight? Leap Indicator will go back to zero, and the seconds counter will show 17 seconds difference (or poll +1), although the message interval will actually still be 16. As with PTP: if you're not seeing the leap second flag and you think you should be - you don't have much time to react. The leap seconds file (see ntp.org documentation) can be one solution - accessible here: (http://www.ietf.org/timezones/data/leap-seconds.list).

For more information on what the leap second is and how we find out about it, check out my last week's post on leap seconds.

As I publish more on PTP and PTPd, I will often be using tshark as the standard troubleshooting tool. It can be used for everything from debugging interoperability issues, through simply learning protocols, to investigating timing performance problems. It's a CLI version of Wireshark - it has to be good.

Comments

comments powered by Disqus