Guidelines and Protocol Extensions for Combining SIP Based
Real-time Media Sessions With XMPP Based Instant Messaging and
Presence Service.
NokiaP.O. Box 407NOKIA GROUPFI00045Finland+358 50 486 4463 simo.veikkolainen@nokia.comNokiaP.O. Box 100NOKIA GROUPFI00045Finland+358 50 522 5984markus.isomaki@nokia.com
RAI
I-DInternet-DraftXMPP
This memo defines guidelines and protocol extensions for
combining Session Initiation Protocol (SIP) based real-time
media sessions with Extensible Messaging and Presence Protocol
(XMPP) based instant messaging and presence services in a
seamless manner. This is accomplished by integration and
protocol extension support in the endpoints, without
requiring any changes in the SIP or XMPP server
infrastructure. It is even possible that SIP and XMPP services
are offered by different service providers.
Currently most standards-based Voice over IP (VoIP)
deployments use Session Initiation Protocol (SIP). In addition
to providing basic voice service SIP has an extensive support
for more advanced telephony features including interworking
with the traditional Public Switched Telephone Network
(PSTN). SIP is also gaining popularity in the field of video
communication.
At the same time, the Extensible Messaging and Presence
Protocol (XMPP) is enjoying widespread usage in instant
messaging and presence services. An interesting scenario
arises when a SIP based voice and video service is combined
together with an XMPP based instant messaging and presence
service.
This memo describes how SIP based real-tome sessions and XMPP
based IM and presence can be offered using existing server
implementations. This memo also presents a set of requirements
and protocol extensions for SIP User Ageng and XMPP client
implementations in order to offer a seamless usage experience
when using SIP based VoIP with XMPP based instant messaging
and presence.
Combining SIP based real-time services with XMPP based presence
and IM service can be accomplished for the most part in the
endpoints; little if anything needs to be done in the service
infrastructure. It is also possible to achieve seamless
integration even when SIP and XMPP services are offered by
different service providers.
The main issues that need to be addressed when offering such
combined services are identities and addressing. SIP and XMPP
both use a similar addressing scheme (based on "user@domain"
structure) to identify users and endpoints but there are some
subtle differencies as well. It is not possible to assume any
algorithmic correlation between SIP and XMPP Universal
Resource Identifiers (URI), even when they identify the same
user or endpoint. New protocol mechanisms are needed to tie
together communication contexts that are based on the two
protocols.
We do not discuss how protocol translation through a gateway
could be performed between the protocols; this is the subject
of other work, see for example .
We focus on one-to-one communication only. Multiparty use
cases such as combining SIP voice conference with XMPP IM
group chat are beyond the scope.
The document structure is as follows: present the document conventions
and definitions, presents
deployment scenarios and use cases, lists the
requirements,
provides an operview of the protocol operation, provides the defintions,
and examples are presented in .
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14, RFC 2119 and indicate
requirement levels for compliant implementations.
The following definitions are used in this memo:
Integrated endpoint is an implementation that combines
the functionality of SIP User Agent and XMPP client and can
offer its user a seamless user experience in the
sense that a single UI and contact information can be user
for voice and video communication using the SIP protocol as
well as instant messaging and presence sharing using
XMPP. We assume an integrated endpoint is able to support
SIP and XMPP protocol extensions defined in this memo.Separated endpoint refers to independent SIP User Agent and
XMPP client implementations that are not aware of each other
if they are used by the same user. The users uses SIP UA for
voice and video, while using XMPP client for IM and
presence. It is assumed that a separated endpoint does not
support any SIP or XMPP protocol extensions defined in this
memo.
This section presents the assumptions we make about SIP and
XMPP deployments with respect to endpoints and server
infrastructure. We also enumerate the actual use cases that
the combined service must accommodate for.
The assumption is that the server infrastructure for SIP and
XMPP are totally separated, thus no exchange of information
is expected between them. There is no assumption that SIP
and XMPP services are even offered by the same service
provider. This means that the user identities can even be
from two different domains. However, if the same service
provider offers both SIP and XMPP service, it is recommended
that the URIs sip:user@domain and xmpp:user@domain
correspond to the same user.
We assume that the user intially only knows the contact
address of the other user in one protocol space. The contact
address for the other protocol must be learned through this.
We consider only cases where two integrated endpoints
interact. When an intergrated endpoint communicates with a
separated endpoint, normal SIP and XMPP procedures apply and
no extensions defined in this memo are used.
Two users who both use an integrated endpoint start an
(XMPP) IM conversation. After the exchange of initial
messages, their UIs show that the other party is capable
of (SIP) voice and/or video in addition to IM. Either
user can at any point add voice and/or video component
to the conversation in such a way that they end up in
the same endpoint and conversation context where the IM
exchange is already taking place. (Note that for this
use case the conversation initiatior initially only
needs to know the other user’s XMPP user id.)
Two users who both use an integrated endpoint start a
(SIP) voice and/or video call. As the call is
established, their UIs show that the other party is
capable for (XMPP) IM. Either user can at any point add
an IM component to the conversation in such a way that
they end up in the same endpoint and conversation
context where the voice and/or call is already taking
place. (Note that for this use case the caller
initially only needs to know the other user’s SIP user
id.)
It is possible to vary the two cases above so that
one fo the users initiates a "multimedia call" to
the other one, i.e. SIP voice and/or video and XMPP
IM are all active from the stary. Tehcnically this
happens according to the two-phased approach above,
and it inivisble from the end-user.
A user of an integrated endpoint is able to publish her
SIP voice and video related presence status as part of
her XMPP presence. The status includes information such
as user’s SIP contact address (for the integrated
endpoint), media capability and availability and whether
the user is currently “on the phone”. Another user of an
integrated endpoint can see the presence status
(assuming she is authorized for that) and based on that
initiate calls. For instance watcher’s UI can now for
certainty show that the user on her roster is capable
of receiving “multimedia calls” (Note that the watcher
initially only needs to know the other user’s XMPP user id.)
OPEN ISSUE: Is there a use case for discovering
other user’s XMPP identity based on her SIP identity
without needing to establish a media session. SIP
OPTIONS would be one possibility for that (as we do
not assume SIP presence support).
This section presents the protocol requirements.
It must be possible for the sender of an XMPP message to
include its SIP contact information within the
message. The contact information must allow the recipient
to establish the SIP session such that the session is
routed to the same endpoint which is hosting the XMPP
conversation. As including the same information
in every message would create some overhead, it is
desirable to be able to convey the contact only once per
IM conversation/thread.
It must be possible for the caller to convey in the SIP
session initiation information which allows the callee to
correlate the session with an ongoing XMPP conversation.
It must be possible for the SIP User Agent Client and User
Agent Server that establish a real-time media session to
exchange their XMPP contact information so that either
endpoint can at any time send XMPP messages to the other
endpoint.
It must be possible for the sender to convey in the XMPP
message information which allows the recipient to
correlate the message with an ongoing SIP session.
It must be possible to include SIP real-time media related
contact and status in XMPP presence information. The
information must contain at least SIP contact address to
identify a user or a user agent instance, media
capabilities and general availability status
OPEN ISSUE: Should we define requirements related to
“error” or “corner” cases here. For instance what happens to
communication attempts after the other party has closed the
conversation context, e.g. a correlated XMPP message that is
sent after the related SIP session is terminated. Or a SIP
INVITE that appears two days after the XMPP IM conversation
was ended.NOTE: There is also an implicit requirement that we must
take Session Border Controllers into account when defining
how SIP User Agents are able to identify an existing session
between them in a common manner.
Both SIP and XMPP allow registration of multiple endpoints
using the same identifier, either a SIP AOR or XMPP
Jabber ID (JID). When two endpoints are enganged in an IM
conversation, for example, and wish to add a voice component
to the communication, it has be ensured that the resulting SIP
dialog is targeted to the same endoint that is
running the IM conversation. Fortunately, both XMPP and SIP
provide a mechanism for this.
defines mechanisms for
a SIP UA to obtain and use a Globally Routable User Agent
(UA) URI. A GRUU will route a call to a specific UA
instance. Unfortunately, not all SIP registrars support the
optional GRUU mechanism. In that case the SIP UA has not other
option but to use its AOR in place of GRUU.
In XMPP, a "full JID" consists of a name,
domain and a resource identifier in the form of
<name@domain/resource>. The resource identifier can be
used to identify a specific endpoint.
This case starts by one endpoint (Bob) sending a message
stanza to another (Alice). Bob includes <thread> element in
the message and chooses a unique value for it. In his first
message Bob also includes his SIP URI in <sip-contact>
element, defined in . If Bob has been able to
obtain a GRUU from his registrar, he populates the
<sip-contact> with that. Otherwise a mere AOR is used. When
Alice receives Bob’s SIP URI Alice stores it associated with
the current <thread>. When responding to Bob’s messages
Alice also includes <thread> and her SIP URI (GRUU or
AOR) in <sip-contact>. Upon receiving Alice’s first message
Bob stores Alice’s SIP URI associated with the current
<thread>. In addition to containing a SIP URI,
<sip-contact> also conveys the information whether an
endpoint supports audio or video or both medias. So, based on
exchanged <sip-contact> elements, both endpoints now know
each others SIP URIs and media capabilities.
The same <thread> value is used in all further messages by
both endpoints to keep track of the conversation. As long as
the <thread> value is unchanged, the <sip-contact>
element need not be repeated, unless either endpoint’s SIP
GRUU changes for some reason.
When either party wants to extend the IM conversation by
adding SIP voice or video session, they address a SIP INVITE
to the SIP URI learned in <sip-contact>. If <contact>
contained a GRUU, it ensures that the INVITE will be routed to
the correct endpoint. The caller populates XMPP-Thread header,
defined in ,
in the INVITE with the value of <thread>. The callee is
thus able to correlate the SIP session to the IM
conversation. The callee replays XMPP-Thread in responses to
INVITE to indicate that the correlation was successful.
In this case two endpoints first have a SIP voice or video
session. They exchange their full JIDs within the session
establishment. The caller (Bob) adds XMPP-Contact header,
defined in , in
INVITE populating it with his full JID. XMPP-Contact also
includes an opaque end-to-end identifier for the session
common to both endpoints. The callee (Alice) stores this
information as part of the session state. In 200 OK response
to INVITE Alice includes similar XMPP-Contact header with
her full JID, and replays the end-to-end session
identifier. Bob stores this information as part of his
session state. Both endpoints now know each others full JIDs
and have a common reference to the session.
OPEN ISSUE: Instead of defining XMPP-specific session
identifier we could use SIP call-id as is. However, there
is a concern that call-id may be changed by SBCs en route
is thus might not be a useful as a common reference for
both User Agents.
defines a Session-ID header that potentially could also be
used.
When either endpoint wants to send XMPP messages to each
other, they address them to the full JID learned from
XMPP-Contact header. This ensures that the messages reach
the correct endpoint. In the very first message the sender
also includes <sip-correlation> element, defined in
, with the session
identifier value learned from XMPP-Contact. The recipient
uses the value to correlate the message with the SIP session
and echoes it back the first message it sends to indicate
that the correlation was succesful.
SIP related presence information is encoded in XMPP presence
schema as an extension. It includes endpoints SIP URI
(preferably GRUU but can be also AOR), media capabilities
(audio, video), and availability (open, closed, busy). Based
on this information XMPP Presence watcher is able to
initiate SIP voice and video sessions.
In this section we define protocol extensions to meet the
requirements stated in the previous section.
The child elements of the message stanza can be extended
with elements from other namespaces. For the purposes of
carrying a SIP identifier in the message stanza, we define
two new elements, the <sip-contact> element and
<sip-correlation> element.
The <sip-contact> element, qualified by
"http://jabber.org/protocol/sip-contact" namespace, has one
mandatory attribute, "target", which defines the target's
SIP URI. The format of the "target" attribute is an
absoluteURI defined in .
When an endpoint initiates an XMPP IM conversation, and
wants to offer a possibility to later add a SIP real-time
media session, it MUST include a <sip-contact> element
as a child element in the first the <message> stanza
it sends, and MUST add a <thread> element and populate
its value according to . The
endpoint MUST include in the “target” attribute of the
<sip-contact> element the SIP URI it wishes to be
contacted at. If the endpoint is aware of its GRUU, it
SHOULD use that as the value in the “target” attribute;
otherwise it MAY use its AOR.
The endpoint receiving an XMPP <message> stanza that
includes <thread> and <sip-contact> elements
MUST copy the <thread> element value to the first
<message> stanza it sends back, as defined in , and MUST include a <sip-contact>
element and set the “target” attribute value to the SIP URI
it wishes to be contacted at.
An endpoint MUST add its audio and video capabilities
defined in to the contact address
in the “target” attribute, and MUST understand those
capabilities if received from the other endpoint. An
endpoint MAY add other media capabilities.
When an endpoint receives a <sip-contact> element in a
<message> stanza, it MUST store the value of the target
attribute, and use it as the SIP URI in an INVITE request if
the user of the endpoint would like to add a SIP session to
the IM conversation context
For example, a <sip-contact> element carrying a SIP
Globally Routable Unique URI (GRUU) would be
In order to indicate that an XMPP IM conversation is
related to an existing SIP session, we define a new
element in the message stanza called <sip-correlation>. The
<sip-correlation>, qualified by the
"http://jabber.org/protocol/sip-correlation" namespace,
has one mandatory attribute, "value".
The endpoint sending the <message> stanza MUST set
the "value" attribute to the value of the
correlation-value parameter of the SIP XMPP-Contact
header. The XMPP-Contact header is exhanged during the
setup of the SIP session. The endpoint MUST also include
a <thread> element in the message.
An endpoint receiving a <message> stanza which
includes a <sip-correlation> element MUST first
compare the "from" attribute value of the <message>
stanza to the XMPP JID in the contact-value of the
XMPP-Contact header of its active SIP sessions. If a
matching SIP session is found, the endpoint MUST compare
the "value" attribute to the correlation-value of the
XMPP-Contact header of that SIP session. If the "value"
attribute matches to correlation-value of an XMPP-Contact
header, the <message> stanza is correlated to that
SIP session. If the user replies to the message, the
values of the <thread> element and the "value"
attribute of the <sip-correlation> element in the
first reply MUST be the same as in the original
message. This indicates that the correlation was
successful. The correlation is valid as long as the
messages are exchanged with the same <thread>
value.
As an example, a <sip-correlation> element carrying
the XMPP-Contact header correlation-value parameter of an
existing SIP session would be
<sip-correlation value="xyz123"/>OPEN ISSUE: XML Schemas to be provided
The XMPP presence stanza defined in can
be extended with any properly-namespaced child element. We
define a new optional element called <contact> which,
qualified by the http://jabber.org/protocol/contact namespace,
MAY appear as a child element in the presence stanza.
The contact element SHOULD be set to the SIP address (GRUU or
AOR) the endpoint wishes to be contacted at for further
communication.
Exact syntax and XML Schema of the correlation element is TBD.
In order to indicate that the SIP dialog is related to
an existing XMPP messaging session, we define a new SIP
header, called XMPP-Thread. The XMPP-Thread contains information
that can be used by the terminating endpoint to correlate
the SIP session establishment to an existing XMPP
conversation.
The endpoint sending a SIP INVITE request MUST include an
XMPP-Thread header, and set its value to the value of the
<thread> element used in the XMPP IM conversation.
The endpoint receiving a SIP INVITE which inludes an
XMPP-Thread header act as follows: it first compares the Contact header value with all
SIP GRUUs from <sip-contact>
elements in active XMPP IM conversations, and unless a
match is found, compares P-Asserted-Identity header value with
all other SIP URIs from
<sip-contact> elements in active XMPP IM
conversationsif a single match is found, the receiving endpoint
MUST the value of the XMPP-Thread element to the
<thread> element values of existing XMPP IM
conversations the endpoint has active, and
if the value matches, the SIP INVITE is correlated to
the IM conversation. The endpoint MUST copy the
XMPP-Thread header to any of the 2xx series responses.
defines support of XMPP-Contact header in SIP requests and
responses, and extends Table 2 of . MESSAGE, SUBCRIBE and NOTIFY, REFER,
INFO, UPDATE, PRACK, and PUBLISH are defined in , , , , , , and , respectively.
The syntax of the XMPP-Thread using augmented Backur-Naur
Form (ABNF) is defined as follows:
The XMPP-Contact header is used to carry the XMPP JID and an
opaque token that can be used for correlation purposes.
When an endpoint initiates a SIP session, and wants to offer
the possibility to later add an XMPP IM conversation, it
MUST include an XMPP-Contact header in the intitial SIP
request. The contact-value of the XMPP-Contact header MUST
be set to the full XMPP JID the endpoint wishes to be contacted
at, and the correlation-value SHOULD be set to the value of
the Call-ID of the SIP session. If the Call-ID cannot be
used, the endpoint MUST select the correlation-value such
that it fulfills the same uniqueness requirements defined
for Call-ID in Section 8.1.1.4 of .
An endpoint sending a 2xx series response to an INVITE that
contains XMPP-Contact header MUST include a XMPP-Contact
header in the response, MUST set the contact-value of the
header to the full XMPP JID the endpoint wishes to be
contacted at, and MUST copy the correlation-value from the
INVITE to the 2xx response.
The endpoint receiving a SIP request or response with an
XMPP-Contact header, MUST store the value of the
correlation-value in order to be able to later correlate an
XMPP IM conversation with the SIP session.
An endpoint initiating a correlated XMPP IM conversation
MUST use the correlation-value in the
<sip-correlation> element as specified in .
defines support of XMPP-Contact header in SIP requests and
responses, and extends Table 2 of . MESSAGE, SUBCRIBE and NOTIFY, REFER,
INFO, UPDATE, PRACK, and PUBLISH are defined in , , , , , , and , respectively.
The syntax of the XMPP-Contact using augmented Backur-Naur
Form (ABNF) is defined as follows:
Bob and Alice are engaged in an XMPP IM session, when Bob
would like to add voice/video component to the discussion.
When Bob and Alice exhange message stanzas, they also include
the SIP address they would like to be contacted at. In this
example, Bob is aware of its GRUU, while Alice is merely aware
of her SIP AOR. Both include the SIP identifier in a contact
element in the message stanza.
In the above message, Bob includes his GRUU, and also the
media capabilities Bob is capable of handling (audio and
video).
Alice sends back a message stanza containing her SIP contact
information.
Bob then decides to add SIP voice call to the existing XMPP
conversation. He picks up Alice's contact information that
Alice sent to him in a message stanza, and issues a SIP INVITE
request to that URI. The XMPP-Thread carries the value of the
<thread> element.
Alice responds with 200OK accepting the session invitiation
request. Alice also includes the XMPP-Thread element to
indicate that she has received the thread and successfully
correlated the session invitation to the XMPP conversation.
Bob then sends a ACK as per normal SIP procedures.
Bob invites Alice to a SIP session. In the INVITE request, Bob
includes the XMPP-Contact header including his XMPP JID.
Alice sends back a message stanza copying the sip-correlation
value indicating the the correlation was successful.
TBD
The contact and correlation iformation is
sensitive and we need to prevent connection hijacking and
impersonation. If the contact information that is sent over
one protocol is forged, the identity verification mechanism
in the other no longer help as an attacker is able to assert
the false identity.
TBD