Microsoft RESPONSE POINT for PBX

SIP DEFINITION

The Session Initiation Protocol (SIP) is a signalling protocol used for establishing sessions in an IP network. A session could be a simple two-way telephone call or it could be a collaborative multi-media conference session. The ability to establish these sessions means that a host of innovative services become possible, such as voice-enriched e-commerce, web page click-to-dial, Instant Messaging with buddy lists, and IP Centrex services.

Over the last couple of years, the Voice over IP community has adopted SIP as its protocol of choice for signalling. SIP is an RFC standard (RFC 3261) from the Internet Engineering Task Force (IETF), the body responsible for administering and developing the mechanisms that comprise the Internet. SIP is still evolving and being extended as technology matures and SIP products are socialised in the marketplace.

The IETF's philosophy is one of simplicity: specify only what you need to specify. SIP is very much of this mould; having been developed purely as a mechanism to establish sessions, it does not know about the details of a session, it just initiates, terminates and modifies sessions. This simplicity means that SIP scales, it is extensible, and it sits comfortably in different architectures and deployment scenarios.

SIP is a request-response protocol that closely resembles two other Internet protocols, HTTP and SMTP (the protocols that power the world wide web and email); consequently, SIP sits comfortably alongside Internet applications. Using SIP, telephony becomes another web application and integrates easily into other Internet services. SIP is a simple toolkit that service providers can use to build converged voice and multimedia services.

In order to provide telephony services there is a need for a number of different standards and protocols to come together - specifically to ensure transport (RTP), to authenticate users (RADIUS, DIAMETER), to provide directories (LDAP), to be able to guarantee voice quality (RSVP, YESSIR) and to inter-work with today's telephone network. Here we will only cover SIP.

SIP's relationship with the web has two main aspects: one is media integration in client software; the other is media integration in content, thus transforming the Internet into a communications platform that overshadows the PSTN.

Client software

SIP allows browsers to become augmented with multimedia capability. Using SIP, simple, but very powerful, services like click-to-dial become possible. User profiles can be managed through a web interface and voice plug-ins are incorporated into browser technology. Netscape, IE and other browsers become universal desktop tools that we use as windows into a computing and communications environment.

A single environment for computing and communications

SIP is used to establish real-time communication sessions in IP networks. A session can contain any combination of media (voice, data, video, audio files, anything). These sessions can be modified at any time to add new parties or to change the nature of the session. The IP network will become ubiquitous, connecting all manner of devices: phones, PCs, PDAs, mobiles etc. The similarity of SIP to HTTP and SMTP and its reuse of many Internet elements means that SIP can bring voice applications under the Internet umbrella.

Three features render SIP ideal for converged services:

It uses MIME, the de facto standard for describing content on the Internet, to convey information about the protocol used to describe the session
SIP has a URL-style addressing system
It borrows from the email model, using the Domain Name System to deliver requests to the server that can appropriately handle them. This simplifies the integration of voice and email
This common Internet heritage makes SIP an ideal bedfellow for e-commerce applications.

Service examples:

Voice-enhanced e-commerce

A website contains click-to-dial links that establish a session between the end-user and the website organisation. A travel agent website, for example, could offer a toll-free service whereby a prospective customer can talk to an agent who can then guide the user through a series of pages, perhaps showing video clips of potential holiday destinations. The customer can complete booking forms online, while talking to the agent and may be offered the opportunity to buy currency or rent a hire-car from a partner organisation. Human interaction can be used to enrich the e-commerce experience.

This kind of service could be a part of a value-added web-service offered by a service provider, or it could be developed in-house by an enterprise's IT department.

Web call centers

A similar idea can be used in the provisioning of web call centers. A web page may be popped when a particular number is called (with SIP, it is just as easy to direct a user to a web page as it is to a telephone). Agents can be used to intervene if the user requests help. SIP can support IVR-type (Interactive Voice Response) functionality, navigating users through options and providing auto-responses to common requests. In addition, SIP's forking facility is perfect for fulfilling the ACD (Automated Call Distribution) function.

When a user is connected directly to a customer service agent, the details of the user can be presented to the agent who can then give the appropriate response. They can also view the same sets of data on their respective screens. Any number of forms of communication between the two parties can be supported: voice, email, IM or videoconference. In this manner, the call center becomes a multimedia contact center.

SIP - working well with the Other Protocols

Session Initiation Protocol (SIP) has become a strong, catalytic force shaping today's telecom industry. This IETF driven protocol represents a key ingredient in the converging world of telecommunications based applications. But SIP does not do everything, and it does not solve every problem. SIP has limits, and SIP works with other protocols to get the job done.

So what are the limits to SIP? And are we losing perspective as an industry when we say that SIP is a one-stop-shop for convergence?

SIP is not the panacea. It was never designed that way, and that's a good thing! Typically all-inclusive approaches (like H.323) have been fraught with difficulty and represent the wrong kind of thinking in today's modular network. SIP is flexible, but it sticks to doing what it does best.

So let's have a closer look. We will see that SIP does certain things well, and leaves other functions alone. We will see that SIP works with a number of other protocols to get the job done while still playing nicely with some neighboring technologies.

SIP - Playing an Important Role

SIP is an IETF application layer protocol for establishing, manipulating, and tearing down sessions. SIP's main purpose is to help session originators deliver invitations to potential session participants wherever they may be. In a nut shell, that is SIP's role.

So SIP is not the panacea - because it was never built to be that way. Let's review two of the fundamental assumptions behind SIP's design:

Reusing Existing Protocols - SIP was designed to specifically reuse as many existing protocols and protocol design concepts. For example, SIP was modeled after HTTP, using URLs for addressing and SDP to convey session information.

Maximizing Interoperability - SIP was also designed so that it would be easy to bind SIP functions to existing protocols and applications, such as e-mail and Web browsers. SIP does this by limiting itself to a modular philosophy - just like many other Internet protocols - and focusing on a specific set of functions.

It's actually good news that SIP does not try to solve everything single-handedly. We can examine this statement more closely with a quick look at the H.323 approach to IP telephony. H.323 is not a single protocol but rather an entire suite of protocols that cover everything from soup to nuts - codecs, call control, conferencing, and many other functions in one vertically integrated stack.

The advantage to this approach is that by strictly controlling so many aspects of the implementation it is easier to ensure that H.323 based systems function well together. On the down side, H.323 becomes heavy and cumbersome. Flexibility is sacrificed as one is tied to a single family of technologies.

For a mature technology this may not be a problem, since the best solutions are likely to have been discovered and incorporated into standards. However for a field as young and fast changing as IP telephony, where many problems and solutions are still under debate, flexibility is more important. SIP is part of this flexible approach, as it uses a wide variety of protocols, each addressing a different aspect of the problem space. The advantage is the ability to choose from among many competing technologies and move to newer and better ones as they emerge. This has always been the philosophy behind SIP and this is the approach of the IETF to IP telephony in general.

SIP is an important piece of this modular approach to IP telephony protocols. SIP addresses the need for a protocol to deal with generalized sessions. This involves finding potential call participants and contacting them as they move from place to place, changing their location and the even equipment they are using. Calls may require the use of multiple streams of various media, and very large numbers of participants might be involved in a call - and even joining and leaving in a constantly changing topology! This is what SIP does.

SIP - Working with Other Protocols

The meteoric ascent of the Internet as a rival to the circuit-switched telephone network has given rise to strong economic and technological reasons for converged services and architectures. In order to assimilate telephony services with the ubiquitous technology of IP, a signalling protocol is required to set up and tear down connections.

A number of different communities put forward solutions, each coloured by their own priorities and interests. The Internet community wanted to introduce innovative services based on enhanced web-authoring tools like XML and more open, peer-to-peer protocols and call models. The IETF offered SIP.

SIP was originally intended to create a mechanism for inviting people to large-scale multipoint conferences on the Internet Multicast Backbone (Mbone). At this stage, IP telephony didn't really exist. It was soon realised that SIP could be used to set up point-to-point conferences - phone calls.

The SIP approach exemplifies classic Internet-style innovation: build only what you need, to address only what is lacking in existing mechanisms. Because the SIP approach is modular and free from underlying protocol or architectural constraints, and because the protocols themselves are simple, SIP has caught on as an alternative to H.323 and to vendor-proprietary mechanisms for transporting SS7 protocols over IP.

SIP was designed to solve only a few problems and to work with a broad spectrum of existing and future IP telephony protocols. To this end SIP provides four basic functions. SIP allows for the establishment of user location (i.e. translating from a user's name to their current network address). SIP provides for feature negotiation so that all of the participants in a session can agree on the features to be supported among them. SIP is a mechanism for call management - for example adding, dropping, or transferring participants. And finally SIP allows for changing features of a session while it is in progress. All of the other key functions are done with other protocols.

Yes this does indeed mean that SIP is not a session description protocol, and that SIP does not do conference control. SIP is not a resource reservation protocol and it has nothing to do with quality of service (QoS). SIP can work in a framework with other protocols to make sure these roles are played out - but SIP does not do them. SIP can function with SOAP, HTTP, XML, VXML , WSDL, UDDI, SDP and an alphabet soup of others. Everyone has a role to play!

There is no question that SIP was designed to be a modular component of a larger IP telephony solution and thus functions well with a large number of these IP related protocols. But SIP is even friendlier as it "plays nicely" with protocols that are often viewed as overlapping in function. For the near term we can expect that SIP will have to coexist with overlapping protocols such as H.323, MGCP, and MEGACO.

H.323 networks are already deployed in many parts of the world. Network operators are interested in growing network capability with coexisting SIP networks. SIP to H.323 translation products are already available. MGCP and MEGACO can also benefit from SIP as by themselves they aren't enough to build a complete IP telephony system. These protocols sit architecturally below SIP and can benefit in functionality by in effect being controlled through SIP.

So, SIP is an important protocol that is becoming widely deployed. SIP is a catalytic protocol that delivers key signaling elements, which can turn a voice over IP network into a true IP communications network - a network capable of delivering next generation converged services. SIP is powerful, and yet simple. But that power comes from doing what it does best, and working nicely with the rest to the other protocols in the converged protocol arena.

SIP is described as a control protocol for creating, modifying and terminating sessions with one or more participants. These sessions include Internet multimedia conferences, Internet (or any IP Network) telephone calls and multimedia distribution. Members in a session can communicate via multicast or via a mesh of unicast relations, or via a combination of these. SIP supports session descriptions that allow participants to agree on a set of compatible media types. It also supports user mobility by proxying and redirecting requests to the user's current location. SIP is not tied to any particular conference control protocol. In essence, SIP has to provide or enable the following functions:

Name translation and user location

Ensuring that the call reaches the called party wherever they are located. Carrying out any mapping of descriptive information to location information. Ensuring that details of the nature of the call (Session) are supported.

Feature negotiation

This allows the group involved in a call (this may be a multi-party call) to agree on the features supported - recognizing that not all the parties can support the same level of features. For example, video may or may not be supported.

Call participant management

During a call a participant can bring other users onto the call or cancel connections to other users. In addition, users could be transferred or placed on hold.

Call feature changes

A user should be able to change the call characteristics during the course of the call. For example, a call may have been set up as 'voice-only', but in the course of the call, the users may need to enable a video function. A third party joining a call may require different features to be enabled in order to participate in the call.


SIP fulfils these functions and re-uses other web elements to make it flexible and scalable.

Rather than defining a new type of addressing system, SIP addresses users by an email-like address. Each user is identified through a hierarchical URL that is built around elements such as a user's phone number or host name (for example, sip:user@company.com). This means that it is just as easy to redirect someone to another phone as it is to redirect someone to a webpage.

SIP uses MIME, the de facto standard for describing content on the Internet, to convey information about the protocol used to describe the session. As a result, SIP messages can contain Java applets, images, audio files, authorization tokens or billing data.

SIP borrows from the email model, using the Domain Name System to deliver requests to the server that can appropriately handle them. This simplifies the integration of voice and email. Servers along the call path can easily create and forward email messages, and vice versa, enabling various combined services.

SIP provides its own reliability mechanism and is therefore independent of the packet layer and only requires an unreliable datagram service. SIP is typically used over UDP or TCP.

SIP provides the necessary protocol mechanisms so that end systems and proxy servers can provide services:

User location
User capabilities
User availability
Call set-up
Call handling
Call forwarding, including:
The equivalent of 700-, 800- and 900- type calls;
Call-forwarding no answer;
Call-forwarding busy;
Call-forwarding unconditional;
Other address-translation services
Callee and calling "number" delivery, where numbers can be any (preferably unique) naming scheme
Personal mobility, i.e., the ability to reach a called party under a single, location-independent address even when the user changes terminals
Terminal-type negotiation and selection: a caller can be given a choice how to reach the party, e.g. via Internet telephony, mobile phone, an answering service, etc.;
Terminal capability negotiation
Caller and callee authentication
Blind and supervised call transfer
Invitations to multicast conference

SIP is based on the request-response paradigm. The following sequence is a simple example of a call set-up procedure:

1. To initiate a session, the caller (or User Agent Client) sends a request with the SIP URL of the called party.

2. If the client knows the location of the other party it can send the request directly to their IP address; if not, the client can send it to a locally configured SIP network server.

3. The server will attempt to resolve the called user's location and send the request to them. There are many ways it can do this, such as searching the DNS or accessing databases. Alternatively, the server may be a redirect server that may return the called user location to the calling client for it to try directly. During the course of locating a user, one SIP network server can proxy or redirect the call to additional servers until it arrives at one that definitely knows the IP address where the called user can be found.

4. Once found, the request is sent to the user and then several options arise. In the simplest case, the user's telephony client receives the request, that is, the user's phone rings. If the user takes the call, the client responds to the invitation with the designated capabilities* of the client software and a connection is established. If the user declines the call, the session can be redirected to a voice mail server or to another user.

* "Designated capabilities" refers to the functions that the user wants to invoke. The client software might support videoconferencing, for example, but the user may only want to use audio conferencing. Regardless, the user can always add functions - such as videoconferencing, white-boarding, or a third user - by issuing another invite request to other users on the link.

SIP has two additional significant features. The first is a stateful SIP server's ability to split or "fork" an incoming call so that several extensions can be rung at once. The first extension to answer takes the call. This feature is handy if a user is working between two locations (a lab and an office, for example), or where someone is ringing both a boss and their secretary.

The second significant feature is SIP's unique ability to return different media types within a single session. For example, a customer could call a travel agent, view video clips of possible holiday destinations, complete an on-line booking form and order currency - all within the same communication session.

SIP Methods

The commands that SIP uses are called methods. SIP defines the following methods:


SIP Method Description
INVITE Invites a user to a call
ACK Used to facilitate reliable message exchange for INVITEs
BYE Terminates a connection between users or declines a call
CANCEL Terminates a request, or search, for a user
OPTIONS Solicits information about a server's capabilities
REGISTER Registers a user's current location
INFO Used for mid-session signalling


SIP responses
The following are SIP responses:
1xx Informational (e.g. 100 Trying, 180 Ringing)
2xx Successful (e.g. 200 OK, 202 Accepted)
3xx Redirection (e.g. 302 Moved Temporarily)
4xx Request Failure (e.g. 404 Not Found, 482 Loop Detected)
5xx Server Failure (e.g. 501 Not Implemented)
6xx Global Failure (e.g. 603 Decline)

They closely resemble HTTP responses


SIP GLOSSARY
SIP: Session Initiation Protocol (RFC 3261) - Application-layer control (signaling) protocol for creating, modifying, and terminating sessions with one or more participants.

SIP Methods: SIP protocol commands or messages (eg: INVITE, BYE)

SIP Response Codes: Responses to SIP Methods indicating success, failure or other information. (eg: 200-Ok)

SIP User Agent (UA): An endpoint device that can issue or respond to SIP protocol methods.

SIP User Agent Client (UAS): A SIP endpoint device issuing the request (eg: Phone,
PC, PDA...).

SIP Gateway: A network element that can convert SIP methods and response codes to
another protocol.

SIP Proxy Server: An intermediary entity that acts as both a server and a client for the purpose of making requests on behalf of other clients.

SDP: Session Description Protocol (RFC 2327): - Text-based protocol describing multi-
media sessions.

Softswitch: Software application that coordinates VoIP call switching between endpoints, commonly duplicating

SIP METHODS
REGISTER: Registers a user with a Proxy/Registrar

INVITE: Session setup request or media negotiation. Used also to hold & retrieve calls

CANCEL: Used to cancel an INVITE transaction

ACK: Acknowledgement for an INVITE transaction completion

BYE: Termination a session

OPTIONS: Used as a query for remote’s status & capabilities

INFO: Mid-call signaling information exchange

SUBSCRIBE: Request notification of call events

NOTIFY: Event notification after an explicit/implicit subscription

REFER: Call Transfer request

SIP RESPONSE CODES
100: Trying - Request has been received by a proxy/gateway

180: Ringing - the called party received the INVITE request, the phone is ringing.

181: Call is being forwarded

182: Queued - Invite has been received and will be processed in a queue

183: Session Progress - Used to convey report of incoming early-media

200: OK - successful transaction completion

302: Moved Temporarily - Forwarded call to given contact

305: Use Proxy - Repeat same call setup using a given proxy

400: Bad Request - General error

401: Unauthorized - The server requires client authentication

404: Not Found - The user does not exist at the specified domain

408: Request Timeout

486: Busy here

5xx: Server Failure

6xx: Global Failure

SIP Trunking Attributes

Switched Access - A SIP trunk offers the CPE device switched access to a diverse set off PSTN and or on-net IP termination points via the ITSP on the network side of the SIP trunk.

Proxy at CPE - The SIP trunking attaches to a SIP device at the customer premise (CP) that is a proxy device, with one or more user agents (UAs) logically attached to it.

SIP Trunk is Switched - Incoming calls from the SIP trunk to the CPE device are switched to other devices sitting behind the interfacing CPE proxy device.

Multiple ENUM Termination - The ITSP should be able to route calls with any variant of DID or 800 numbers to a specific client on the SIP trunk; i.e. the SIP trunk should not be limited to a single 800 or DID number as it is a switched service that the CPE Proxy will finally route in its domain.

Bandwidth QoS Management - In conjunction with the ITSP proxy/soft switch and the CPE edge controller, bandwidth is managed such that any shared bulk traffic from the enterprise is throttled back, giving the VoIP media traffic priority, and in so doing attains a QoS for VoIP traffic over that SIP trunk.

Cause Code Management - In the event that usable bandwidth on the SIP trunk is exhausted, either the ITSP or CPE side device must be able to determine that the event exists. Either side must then be able to issue the correct SIP cause code so that notification can be sent upstream. Rerouting must then be done, or call progress tones played to the originating end user.

Firewall Traversal - At the CPE side, a border controller/firewall must be provisioned that allows scalability via manipulation at the level 5 layer (SIP layer). This allows calls to traverse and pass through the enterprise firewall seamlessly, and with no security issues (as is the case with STUN or manual pinhole techniques).

Security - Both the ITSP on the carrier side, and the enterprise SUA at the CPE side, must provide the necessary security to ensure that unauthorized users cannot gain access to SIP call facilities interworking between the enterprise and ITSP.
This could include:
IP authentication between the CPESU and ITSP proxy
Registration and authentication via MD5
IPSEC tunnels (optional)
TLS (someday)

If you want to get your hands dirty with SIP...

SIP's intimate association with all things Internet establishes telephony as part of a continuum of Internet media options. Its similarities with HTTP and SMTP and its text-based format mean that SIP is familiar to web developers.

In order to develop services, programmers need APIs. There have been many advances in this area of SIP, resulting in numerous new interfaces.

CPL (Call Processing Language

This was the first API developed for SIP. Strictly speaking, it is not really an API at all, but rather an XML-based scripting language for describing and controlling call services. It is designed to be implemented on either network servers or user agent servers and is meant to be simple, extensible, easily edited by graphical clients, and independent of operating system or signalling protocol.

CPL is engineered for end-user service creation: a CPL interpreter is very lightweight and a server can easily parse and validate a CPL, guarding against malicious behaviour. It is suitable for running on a server where users may not be allowed to execute arbitrary programs, as it has no variables, loops, or ability to run external programs. It has primitives for making decisions and taking actions based on call properties, such as time of day, caller, called party etc.

The SIP-CPL draft can be found on the IETF website

SIP-CGI

In the World Wide Web, the Common Gateway Interface (CGI) has served as a popular means of programming web services. CGI scripts have been the initial mechanism to make websites interact with databases and other applications. Due to the similarities between the SIP and HTTP, CGI is a good candidate for service creation in a SIP environment.

Like HTTP CGI, a SIP CGI script resides in the server and passes message parameters through environment variables to a separate process. The process sends instructions back to the server through its standard output file descriptor. SIP CGI is almost identical to HTTP CGI and is particularly suitable for services that contain substantial web components.

A CGI script can be written in Perl, Tcl, C, C++ or Java making it accessible to a large community of developers.

The draft standard is on the IETF website

SIP Servlets

An HTTP servlet is a Java application that runs in a Web server or application server and provides server-side processing, typically to access a database or perform e-commerce processing. It is a Java-based replacement for CGI scripts, Active Server Pages (ASPs) and proprietary plug-ins written in C and C++. Servlets are similar to the CGI concept but, instead of using a separate process, messages are passed to a class that runs within a JVM (Java Virtual Machine) inside the server.

SIP Servlets are very similar to HTTP Servlets; they simply enhance the interface to support SIP functions.

Because they are written in Java, servlets are portable between servers and operating systems.

The specification being developed under the Java Community ProcessSM can be found on the JCP.org site.

JAIN(TM) APIs

The JAIN APIs are being specified as a community extension to the Java(TM) platform. By providing a new level of abstraction and associated Java interfaces for service creation across circuit switched and packet networks, JAIN is bridging IP and IN protocols to create an open market.

The objective of the JAIN initiative is to create an open value chain in the provisioning of telecom services by addressing service portability, network convergence and service provider access.

Service Portability: - Write Once, Run Anywhere. JAIN APIs reshape proprietary interfaces to enable truly portable applications.
Network Convergence: (Integrated Networks) - Any Network. JAIN technology allows services to run over any underlying network architecture, whether IP, ATM, TDM or wireless.
Service Provider Access - By Anyone! JAIN APIs specify mechanisms to allow abstracted services direct access to network resources and devices to carry out specific actions or functions.
JAIN SIP, SIP Lite, SIP Servlets

There are currently three SIP APIs that have either been developed or that are under development within the JAIN initiative:

JAIN SIP - JAIN SIP is a low level API that maps directly to RFC 2543 published by the IETF. JAIN SIP is at Final Release and the API specification, Reference Implementation and Technology Compatibility Kit (test suite) can be freely downloaded from the Java Community Process website.
JAIN SIP Lite - The JAIN SIP Lite API is a high-level API. The goal of this high-level API is to allow application developers to create applications that have SIP as their underlying protocol without having to have an extensive knowledge of the SIP protocol. This will allow developers to rapidly create applications, such as user agent type applications. JAIN SIP Lite is a thin Java API that can be used as a high-level wrapper around the SIP protocol that will provide application developers with an API that is easy to use.
SIP Servlets - See the SIP Servlets section for further information.
For further information see:
- the JAIN website at http://java.sun.com/products/jain
- the Java Community Process site at http://jcp.org/
- the JAIN SIP 1.0 API specification, RI and TCK
- the JAIN SIP Lite specification (under development)

Parlay

The Parlay Group was formed in 1998 to specify and promote open APIs that "intimately link IT applications with the capabilities of the communications world". Initial efforts have focused on call control, messaging, though the prime focus of Parlay is to allow applications to access the functionality of the telecoms network in a secure way.

Parlay APIs consist of two categories of interface:

Service interfaces offering applications access to a range of a network capabilities and information
Framework interfaces providing the supporting capabilities necessary for the service interfaces to be secure and manageable.

The Parlay APIs are defined in Universal Modeling Language (UML). The JAIN Service Provider APIs (SPA) define a full industry standard Java technology realization of the Parlay APIs. In addition to Java API specifications, JAIN SPA provide:
- Java Reference Implementations
- Technology Compatibility Kit (i.e., Java test suites)
- A complete API Certification program