PJSIP: Automatic Switch Transport type from UDP to TCP

We encounterred with SIP signaling commands lost issues recently in different terminals, environments, and scenarios.
And we were using UDP as our prior transport type.
The potential cause could be:
1. There were SIP commands which could larger than MTU size.
2. The send/recv queue buffer size of the socket handle was not enough.
3. SIP command(Conference control) were really tremendous

There are some informations about this issue, also could be a way out of this.

According to ​RFC 3261 section 18.1.1:
“If a request is within 200 bytes of the path MTU, or if it is larger than 1300 bytes and the path MTU is unknown, the request MUST be sent using an RFC 2914 congestion controlled transport protocol, such as TCP.”

if Request is Larger than 1300 bytes.

By this rule, PJSIP will automatically send the request with TCP if the request is larger than 1300 bytes. This feature was first implemented in ticket #831. The switching is done on request by request basis, i.e. if an initial INVITE is originally meant to use UDP but end up being sent with TCP because of this rule, then only that initial INVITE is sent with TCP; subsequent requests will use UDP, unless of course if it’s larger than 1300 bytes. In particular, the Contact header stays the same. Only the Via header is changed to TCP.
It could be the case that the initial INVITE is sent with UDP, and once the request is challenged with 401 or 407, the size grows larger than 1300 bytes due to the addition of Authorization or Proxy-Authorization header. In this case, the request retry will be sent with TCP.
In case TCP transport is not instantiated, you will see error similar to this:
“Temporary failure in sending Request msg INVITE/cseq=15228 (tdta02EB0530), will try next server. Err=171060 (Unsupported transport (PJSIP_EUNSUPTRANSPORT))
As the error says, the error is not permanent, as PJSIP will send the request anyway with UDP.
This TCP switching feature can be disabled as follows:
● at run-time by setting pjsip_cfg()->endpt.disable_tcp_switch to PJ_TRUE.
● at-compile time by setting PJSIP_DONT_SWITCH_TO_TCP to non-zero
You can also tweak the 1300 threshold by setting PJSIP_UDP_SIZE_THRESHOLD to the appropriate value.

Vendor ID, Product ID information in SIP

As you may know, to be a robust meeting entity, we must take good care of compatibility requirements for different facilities from different manufacturers.

In H.323 protocol, we can use fields like Vendor ID, Product ID, Version ID in the signaling commands.

But how to do this when you are using SIP protocol?

  1. Definitions in RFC 3261

20.35 Server

   The Server header field contains information about the software used

   by the UAS to handle the request.

   Revealing the specific software version of the server might allow the

   server to become more vulnerable to attacks against software that is

   known to contain security holes. Implementers SHOULD make the Server

   header field a configurable option.


      Server: HomeServer v2

20.41 User-Agent

   The User-Agent header field contains information about the UAC

   originating the request.  The semantics of this header field are

   defined in [H14.43].

   Revealing the specific software version of the user agent might allow

   the user agent to become more vulnerable to attacks against software

   that is known to contain security holes.  Implementers SHOULD make

   the User-Agent header field a configurable option.


      User-Agent: Softphone Beta1.5



  1. [H14.43] User-Agent definition in RFC2616

14.43 User-Agent

The User-Agent request-header field contains information about the user agent originating the request. This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations. User agents SHOULD include this field with requests.

The field can contain multiple product tokens (section 3.8) and comments identifying the agent and any subproducts which form a significant part of the user agent. By convention, the product tokens are listed in order of their significance for identifying the application.

User-Agent     = “User-Agent” “:” 1*( product | comment )


User-Agent: CERN-LineMode/2.15 libwww/2.17b3



  1. How TANDBERG and Polycom implemented?

User-Agent format of TANDBERG 775
Server format of TANDBERG 775


User-Agent format of Polycom

So, jump to the conclusion:

  1. As UAC, identify yourself in User-Agent field.
  2. As UAS, identify yourself in Server field.

Comparing with TANDBERG and POLYCOM’s implementation, TANDBERG format is more proper.

Media Control(Video Picture Fast Update) mechanism for SIP

This question was origined from an experience of conferencing with different meeting terminals, including Polycom, Cisco, Tandburg, Huawei, etc.

In our current implementation of SIP conference, we are using a stream_id tag in the Video Fast Update command to  tell the peer we are requesting an Intra frame for a specific stream. And the stream_id tag value was recorded from the Label attribute of the SDP exchange process.
However, this situation was:
1. Some of the vendors don’t have stream_id in the VideoFastUpdate command, such as CISCO and Tandburg, but if we send a VideoFastUpdate with stream_id tag in it, it doesn’t matter, it can response a 200 OK, only the stream_id value can not be zero, otherwise, it will reply with a 500 error.
2. Polycom does have a stream_id in it, but no matter what circumstances, the stream_id is alway 1.
3. Huawei seems have a same implementation with Kedacom, having a stream_id in it, and the value is coherence with the LABEL tag in the SDP.

Then I turned to the RFC document, RFC5168: XML Schema for Media Control, category: informational, developed by Microsoft, Polycom, Radvision.
The definition is placed in phase 5 of this document:
The Schema Definition

  <?xml version="1.0" encoding="utf-8" ?>

   <xs:schema id="TightMediaControl"

           <xs:element name="media_control">
                     <xs:element name="vc_primitive"
                                           maxOccurs="unbounded" />
                     <xs:element name="general_error"
                                           maxOccurs="unbounded" />

           <!-- Video control primitive.  -->

           <xs:complexType name="vc_primitive">
                     <xs:element name="to_encoder" type="to_encoder" />
                      <xs:element name="stream_id"
                                       maxOccurs="unbounded" />

           <!-- Encoder Command:
                Picture Fast Update

           <xs:complexType name="to_encoder">
                           <xs:element name="picture_fast_update"/>


So, as you can see, there is actually a stream_id tag in it. But when I tried to find more about it, nothing was found. Weird enough for a RFC document.

After re-read the full document, found out there was a description which explains the situation:
New implementations are discouraged from using the method described except for backward compatibility purposes. New implementations are required to use the new Full Intra Request command in the RTP Control Protocol (RTCP) channel.

Some key notes for RFC3264

Facing a new task of standardizing SIP protocols for the Kedacom conference Endpoints.

So I digged into some RFC document recently. Here are some key notes for RFC3264: An OfferAnswer Model with the Session Description Protocol (SDP)

1. Capatibility comparison – Direction


If “a=sendrecv” attribute does not exist, or been omitted, that means the direction is sendrecv, since sendrecv is the default.

2. [Offering] RTP Port in m


RTP port in m line for recvonly and sendrecv streams, while RTCP port for sendonly streams.

For recvonly and sendrecv streams, the port number and address in the
offer indicate where the offerer would like to receive the media
stream.  For sendonly RTP streams, the address and port number
indirectly indicate where the offerer wants to receive RTCP reports.
Unless there is an explicit indication otherwise, reports are sent to
the port number one higher than the number indicated.  The IP address
and port present in the offer indicate nothing about the source IP
address and source port of RTP and RTCP packets that will be sent by
the offerer.

3. [Offering] Switching media format if multiple formats supported**


If multiple formats are listed, it
means that the offerer is capable of making use of any of those
formats during the session.  In other words, the answerer MAY change
formats in the middle of the session, making use of any of the
formats listed, without sending a new offer.

[Offering] Capatibility comparison – Preference


In all cases, the formats in the “m=” line MUST be listed in order of
preference, with the first format listed being preferred.  In this
case, preferred means that the recipient of the offer SHOULD use the
format with the highest preference that is acceptable to it.

[Offering] Capatibility comparison – Preference 2


For sendrecv RTP
streams, the payload type numbers indicate the value of the payload
type field the offerer expects to receive, and would prefer to send.
However, for sendonly and sendrecv streams, the answer might indicate
different payload type numbers for the same codecs, in which case,
the offerer MUST send with the payload type numbers from the answer.

* This is what SIP different with H.323, the payload type value in H.245 OLC mean’s the request will send the payload with this payload type value, while in SDP it means you should send your payload with the value I specified in my SDP.

[Offering] Bandwidth description


If the bandwidth attribute is present for a stream, it indicates the
desired bandwidth that the offerer would like to receive.  A value of
zero is allowed, but discouraged.  It indicates that no media should
be sent.  In the case of RTP, it would also disable all RTCP.

[Offering] A typical usage example for multiple media streams


A typical usage example for multiple media streams of the same type
is a pre-paid calling card application, where the user can press and
hold the pound (“#”) key at any time during a call to hangup and make
a new call on the same card.  This requires media from the user to
two destinations – the remote gateway, and the DTMF processing
application which looks for the pound.  This could be accomplished
with two media streams, one sendrecv to the gateway, and the other
sendonly (from the perspective of the user) to the DTMF application.

[Answering] Answering SDP must have at least one media format while rejecting


To reject an offered
stream, the port number in the corresponding stream in the answer
MUST be set to zero.  Any media formats listed are ignored.  At least
one MUST be present, as specified by SDP.

[Answering] Stream marked as recvonly can suggest a new format for the offerer


For streams marked as recvonly in the answer, the “m=” line MUST
contain at least one media format the answerer is willing to receive
with from amongst those listed in the offer.  The stream MAY indicate
additional media formats, not listed in the corresponding stream in
the offer, that the answerer is willing to receive.

Similarly, just like recvonly streams, sendrecv streams can suggest new formats for the offerer (of course, it will not be able to send them at this time, since it was not listed in the offer).

[Answering]Capability comparison – RECOMMENDED Order for answerer


If a stream in the offer lists
audio codecs 8, 22 and 48, in that order, and the answerer only
supports codecs 8 and 48, it is RECOMMENDED that, if the answerer has
no reason to change it, the ordering of codecs in the answer be 8,
48, and not 48, 8.

[Answering]Prefer RTP payload type with the value in the offer rather than in the answer


In the case of RTP, it MUST use the payload type numbers
from the offer, even if they differ from those in the answer.

[multicast] ??? How to handle participants in a some conference but with different format support ???


The set of media formats in the answer MUST be equal to or be a
subset of those in the offer.  Removing a format is a way for the
answerer to indicate that the format is not supported.

[Processing Answer]


It(Call offerer) MUST send using a media format listed in the answer,
and it SHOULD use the first media format listed in the answer when it
does send.

The reason this is a SHOULD, and not a MUST (its also a SHOULD,
and not a MUST, for the answerer), is because there will
oftentimes be a need to change codecs on the fly.  For example,
during silence periods, an agent might like to switch to a comfort
noise codec.  Or, if the user presses a number on the keypad, the
agent might like to send that using RFC 2833 [9].  Congestion
control might necessitate changing to a lower rate codec based on

[Modifying the Session] Session version of REINVITE


When issuing an offer that modifies the session,
the “o=” line of the new SDP MUST be identical to that in the
previous SDP, except that the version in the origin field MUST
increment by one from the previous SDP.  If the version in the origin
line does not increment, the SDP MUST be identical to the SDP with
that version number.

[Modifying the Session] Rules for m(Media stream) line of REINVITE


If an SDP is offered, which is different from the previous SDP, the
new SDP MUST have a matching media stream for each media stream in
the previous SDP.

[Modifying the Session] Adding a Media Stream in REINVITE


New media streams are created by new additional media descriptions
below the existing ones, or by reusing the “slot” used by an old
media stream which had been disabled by setting its port to zero.

[Modifying the Session] Changing the Set of Media Formats in REINVITE


For example, if A generates an offer
with G.711 assigned to dynamic payload type number 46, payload type
number 46 MUST refer to G.711 from that point forward in any offers
or answers for that media stream within the session.  However, it is
acceptable for multiple payload type numbers to be mapped to the same
codec, so that an updated offer could also use payload type number 72
for G.711.

     The mappings need to remain fixed for the duration of the session
      because of the loose synchronization between signaling exchanges
      of SDP and the media stream.

[Modifying the Session] Sending Media Stream after REINVITE is done


Similarly, as described in Section 6, as soon as it sends
its answer, the answerer MUST begin sending media using any formats
in the offer that were also present in the answer
Similarly, when the offerer receives the
answer, it MUST begin sending media using any formats in the answer

[Modifying the Session] Rules of ceasing use of an old media format for Agents


When an agent ceases using a media format (by not listing that format
in an offer or answer, even though it was in a previous SDP) the
agent will still need to be prepared to receive media with that
format for a brief time.

[Modifying the Session] Hold on a call


Hold on a call means request the other participant(s) stop sending streams to it.

   If the stream to be placed on hold was previously a sendrecv media
stream, it is placed on hold by marking it as sendonly.  If the
stream to be placed on hold was previously a recvonly media stream,
it is placed on hold by marking it inactive.

Certain third party call control scenarios do not work when an
      answerer responds to held SDP with held SDP.

[Modifying the Session] Hold on a call 2

Does it mean the sending of the Hold on requester will continue? The answer is no.

   Typically, when a user “presses” hold, the agent will generate an
offer with all streams in the SDP indicating a direction of sendonly,
and it will also locally mute, so that no media is sent to the far
end, and no media is played out.

BFCP sucks

I’m participating in a project which was targeted to dual stream control together with MTs of Polycom and Huawei, by using BFCP protocol while a SIPProxy is invloved.

Working environment:
A. Several MTs working behind different NAT.
B. A Polycom SIP Proxy Server in internet.

Polycom MT model: POLYCOM RealPresence Group 550, firmware version: unknown.
Huawei MT model: TE60, firmware version: TEX0 V100R001C01B024SP05 Release May 7 2014 04:06:40
Kedacom MT model: HD3

Both of Polycom and Huawei MTs are using BFCP over UDP mode in proxy mode, no matter their SIP stack is using TLS or not, same if we forced to use SIP over TCP only.
Seems the root cause is Polycom SIP Proxy supports only BFCP over UDP for relay BFCP commands.

However, our SIP MT supports only BFCP over TCP.

So to implement BFCP over UDP for our MT, the major job it is.

Here we go.

1. Try get the BFCP related RFC documents

First thing to do is go to ietf to check out the RFC documents which related with BFCP.


There you’ll get 3 Standard typed RFC documents:
a. RFC 4582: The Binary Floor Control Protocol (BFCP)
b. RFC 4583: Session Description Protocol (SDP) Format for Binary Floor Control Protocol (BFCP) Streams
c. RFC 5018: Connection Establishment in the Binary Floor Control Protocol (BFCP)

and some drafts:
a. draft-ietf-bfcpbis-rfc4582bis-11, The Binary Floor Control Protocol (BFCP)
b. draft-sandbakken-dispatch-bfcp-udp-03, Revision of the Binary Floor Control Protocol (BFCP) for use over an unreliable transport
c. draft-ietf-bfcpbis-rfc453bis-09, Session Description Protocol (SDP) Format for Binary Floor Control Protocol (BFCP) Streams

2. Get into the documents for BFCP over UDP

After do some dig into the documents, turns out all the 3 RFCs are not describing BFCP over UDP or relavant.
That is to say we have no standard to follow for BFCP over UDP for now, we can only turn to the draft documents:

A. draft-ietf-bfcpbis-rfc4582bis-11
B. draft-sandbakken-dispatch-bfcp-udp-03
C. draft-ietf-bfcpbis-rfc453bis-09

Relationship of A, B, C:
A is the newest one, it replaced B which once replaced C.

3. About BFCP UDP and TCP

The major difference between BFCP over TCP and BFCP over UDP are:
A. UDP floor control message needs an ACK to confirm the message was received(and processed) properly, while BFCP over TCP can benefit from the TCP reliable machenism.
B. UDP packets can be relayed more easier than TCP packets while using Proxies like in this scenario.

4. Definition confliction in Primitive value

Bad thing happens, even protocol designed by great company as Cisco and Ericsson.

In draft-sandbakken-dispatch-bfcp-udp-03
| Value | Primitive             | Direction          |
|   14  | FloorRequestStatusAck | P -> S ; Ch -> S   |
|   15  | ErrorAck              | P -> S ; Ch -> S   |
|   16  | FloorStatusAck        | P -> S ; Ch -> S   |
|   17  | Goodbye               | P -> S ; Ch -> S ; |
|       |                       | P <- S ; Ch <- S   |
|   18  | GoodbyeAck            | P -> S ; Ch -> S ; |
|       |                       | P <- S ; Ch <- S   |

In draft-ietf-bfcpbis-rfc4582bis-11
| Value | Primitive             | Direction          |
|   1   | FloorRequest          | P -> S             |
|   2   | FloorRelease          | P -> S             |
|   3   | FloorRequestQuery     | P -> S ; Ch -> S   |
|   4   | FloorRequestStatus    | P <- S ; Ch <- S   |
|   5   | UserQuery             | P -> S ; Ch -> S   |
|   6   | UserStatus            | P <- S ; Ch <- S   |
|   7   | FloorQuery            | P -> S ; Ch -> S   |
|   8   | FloorStatus           | P <- S ; Ch <- S   |
|   9   | ChairAction           | Ch -> S            |
|   10  | ChairActionAck        | Ch <- S            |
|   11  | Hello                 | P -> S ; Ch -> S   |
|   12  | HelloAck              | P <- S ; Ch <- S   |
|   13  | Error                 | P <- S ; Ch <- S   |
|   14  | FloorRequestStatusAck | P -> S ; Ch -> S   |
|   15  | FloorStatusAck        | P -> S ; Ch -> S   |
|   16  | Goodbye               | P -> S ; Ch -> S ; |
|       |                       | P <- S ; Ch <- S   |
|   17  | GoodbyeAck            | P -> S ; Ch -> S ; |
|       |                       | P <- S ; Ch <- S   |

Still not find out what the problem it is? Check the Primitive Value from 15 to 17, the definitions in the two drafts are complete different.

BTW: Wireshark, Version 1.10.6 (v1.10.6 from master-1.10),  interpret the primitive value according to draft-sandbakken-dispatch-bfcp-udp-03

5. Bug or something else?

Because we are using Polycom’s SIP proxy for test, so we took Polycom’s BFCP protocol command for the standard one.
But their are surprises after digged into Polycom 550:

a. Does not recognize FloorRelease command.
Polycom 550 was a Server (BFCP server role), and Huawei TE60 as a Participant(BFCP client role), when Huawei TE60 sent a FloorRelease command to Polycom 550, Polycom 550 did not response to this command.
However, in the HelloAck reply, Polycom declared that it do support FloorRelease.
Don’t know why.

BFCP - HelloAck - SupportedPrimitives
BFCP – HelloAck – SupportedPrimitives

b. Floor Control Header info
There is a field represents as Transaction Responser flag(the 5th bit of the first byte), 1 means a responder, 0 not. But in Polycom BFCP command protocols, seems there are different rules in it.
Here are two example packets sent out by Polycom 550, first one have the responder field value set to 1, second one 0.

BFCP - FloorControlHeader - Responder - True
BFCP – FloorControlHeader – Responder – True
BFCP - FloorControlHeader - Responder - False
BFCP – FloorControlHeader – Responder – False

c. Constructing OverallRequestStatus

Are there any rules about the levels for OverallRequestStatus in FloorRequestInformation? ‘Cause I found out that there are some slight differences between Huawei TE60 and Polycom 550.

BFCP - FloorRequestInformation - Polycom550
BFCP – FloorRequestInformation – Polycom550
BFCP - FloorRequestInformation - HuaweiTE60
BFCP – FloorRequestInformation – HuaweiTE60

See, Huawei puts the OverallRequestStatus parallel with FloorRequestStatus, while Polycom puts the FloorRequestStatus under the OverallRequestStatus.
(Not sure whether I mis-interpret the IETF drafts)

6. A temporary conclusion
BFCP sucks:
1. We don’t have a standard to follow(for BFCP over UDP).
2. The primitive definition conflictions in different drafts are really sick.