A Survey of Lower-than-Best Effort Transport Protocols
University of Innsbruck
Technikerstr. 21 A
Innsbruck
6020
Austria
+43 512 507 6110
michael.welzl@uibk.ac.at
General
Internet Engineering Task Force
template
This document provides a survey of transport protocols which are designed to have a smaller bandwidth and/or delay impact on standard TCP than standard TCP itself when they share a bottleneck with it. Such protocols could be used for low-priority "background" traffic, as they provide what is sometimes called a "less than" (or "lower than") best effort service.
As a starting point for the work in the LEDBAT group, this document presents a brief survey of efforts to attain a Less than Best Effort (LBE) service without help from routers. We loosely define a LBE service as a service which has smaller bandwidth and/or delay impact on standard TCP than standard TCP itself when sharing a bottleneck with it. We refer to systems that provide this service as Less than Best Effort (LBE) systems. Generally, LBE behavior can be achieved by reacting to queue growth earlier than standard TCP would, or by changing the congestion avoidance behavior of TCP without utilizing any additional implicit feedback. Some mechanisms achieve a LBE behavior at the application layer, e.g. by changing the receiver window of standard TCP, and there is also a substantial amount of work that is related to the LBE concept but not presenting a solution that can be installed in end hosts or expected to work over the Internet. According to this classification, solutions have been categorized as delay-based transport protocols, non-delay-based transport protocols, application layer approaches and orthogonal work in this document.
The author wishes to emphasize that, in its present form, this document is only a starting point and not based on a thorough literature study. Many relevant references will be missing, and an apology goes to all authors of related work that has not been mentioned here.
It is wrong to generally equate "little impact on standard TCP" with "small sending rate". Unless the sender's maximum window is limited for some reason, and in the absence of ECN support, standard TCP will normally increase its rate until a queue overflows, causing one or more packets to be dropped and the rate to be reduced. A protocol which stops increasing the rate before this event happens can, in principle, achieve a better performance than standard TCP. In the absence of any other traffic, this is even true for TCP itself when its maximum send window is limited to the bandwidth*round-trip time (RTT) product.
TCP Vegas is one of the first protocols that was known to have a smaller sending rate than standard TCP when both protocols share a bottleneck -- yet it was designed to achieve more, not less throughput than standard TCP. Indeed, when it is the only protocol on the bottleneck, the throughput of TCP Vegas is greater than the throughput of standard TCP. Depending on the bottleneck queue length, TCP Vegas itself can be starved by standard TCP flows. This can be remedied to some degree by the RED Active Queue Management mechanism .
The congestion avoidance behavior is the protocol's most important feature in terms of historical relevance as well as relevance in the context of this document (it has been shown that other elements of the protocol can sometimes play a greater role for its overall behavior ). In Congestion Avoidance, once per RTT, TCP Vegas calculates the expected throughput as WindowSize / BaseRTT, where WindowSize is the current congestion window and BaseRTT is the minimum of all measured RTTs. The expected throughput is then compared with the actual (measured) throughput. If the actual throughput is smaller than the expected throughput minus a threshold, this is taken as a sign that the network is underutilized, causing the protocol to linearly increase its rate. If the actual throughput is greater than the expected throughput plus a threshold, this is taken as a sign of congestion, causing the protocol to linearly decrease its rate.
TCP Vegas has been analyzed extensively. One of the most prominent properties of TCP Vegas is its fairness between multiple flows of the same kind, which does not penalize flows with large propagation delays in the same way as standard TCP. While it was not the first protocol that uses delay as a congestion indication, its predecessors (which can be found in ) are not discussed here because of the historical "landmark" role that TCP Vegas has taken in the literature.
Transport protocols which were designed to be non-intrusive include TCP-LP, TCP Nice and 4CP. Using a simple analytical model, the authors of illustrate the feasibility of this endeavor by showing that, due to the non-linear relationship between throughput and RTT, it is possible to remain transparent to standard TCP even when the flows under consideration have a larger RTT than standard TCP flows.
TCP Nice follows the same basic approach as TCP Vegas but improves upon it in some aspects. Because of its moderate linear-decrease congestion response, TCP Vegas can affect standard TCP despite its ability to detect congestion early. TCP Nice removes this issue by halving the congestion window (at most once per RTT, like standard TCP) instead of linearly reducing it. To avoid being too conservative, this is only done if a fixed predefined fraction of delay-based incipient congestion signals appears within one RTT. Otherwise, TCP Nice falls back to the congestion avoidance rules of TCP Vegas if no packet was lost or standard TCP if a packet was lost. One more feature of TCP Nice is its ability to support a congestion window of less than one packet, by clocking out single packets over more than one RTT. With ns-2 simulations and real-life experiments using a Linux implementation, the authors of show that TCP Nice achieves its goal of efficiently utilizing spare capacity while being non-intrusive to standard TCP.
Other than TCP Vegas and TCP Nice, TCP-LP uses only the one-way delay (OWD) instead of the RTT as an indicator of incipient congestion. This is done to avoid reacting to delay fluctuations that are caused by reverse cross-traffic. Using the TCP Timestamps option , the OWD is determined as the difference between the receiver's Timestamp value in the ACK and the original Timestamp value that the receiver copied into the ACK. While the result of this subtraction can only precisely represent the OWD if clocks are synchronized, its absolute value is of no concern to TCP-LP and hence clock synchronization is unnecessary.
Using a constant smoothing parameter, TCP-LP calculates an Exponentially Weighted Moving Average (EWMA) of the measured OWD and checks whether the result exceeds a threshold within the range of the minimum and maximum OWD that was seen during the connections's lifetime; if it does, this condition is interpreted as an "early congestion indication". The minimum and maximum OWD values are initialized during the slow-start phase.
Regarding its reaction to an early congestion indication, TCP-LP tries to strike a middle ground between the overly conservative choice of immediately setting the congestion window to one packet and the presumably too aggressive choice of halving the congestion window like standard TCP. It does so by halving the window at first in response to an early congestion indication, then initializing an "interference time-out timer", and maintaining the window size until this timer fires. If another early congestion indication appeared during this "interference phase", the window is then set to 1; otherwise, the window is maintained and TCP-LP continues to increase it the standard Additive-Increase fashion. This method ensures that it takes at least two RTTs for a TCP-LP flow to decrease its window to 1, and, like standard TCP, TCP-LP reacts to congestion at most once per RTT.
With ns-2 simulations and real-life experiments using a Linux implementation, the authors of show that TCP-LP is largely non-intrusive to TCP traffic while at the same time enabling it to utilize a large portion of the excess network bandwidth, which is fairly shared among competing TCP-LP flows. They also show that using their protocol for bulk data transfers greatly reduces file transfer times of competing best-effort web traffic.
4CP, which stands for "Competitive and Considerate Congestion Control", is a protocol which provides a LBE service by changing the window control rules of standard TCP. A "virtual window" is maintained, which, during a so-called "bad congestion phase" is reduced to less than a predefined minimum value of the actual congestion window. The congestion window is only increased again once the virtual window exceeds this minimum, and in this way the virtual window controls the duration during which the sender transmits with a fixed minimum rate. The 4CP congestion avoidance algorithm allows for setting a target average window and avoids starvation of "background" flows while bounding the impact on "foreground" flows. Its performance was evaluated in ns-2 simulations and in real-life experiments with a kernel-level implementation in Microsoft Windows Vista.
Some work was done on applying weights to congestion control mechanisms, allowing a flow to be as aggressive as a number of parallel TCP flows at the same time. This is usually motivated by the fact that users may want to assign different priorities to different flows. The first, and best known, such protocol is MulTCP, which emulates N TCPs in a rather simple fashion. An improved version of MulTCP is presented in , and there is also a variant where only one feedback loop is applied to control a larger traffic aggregate by the name of Probe-Aided (PA-)MulTCP. Another protocol, CP, applies the same concept to the TFRC protocol in order to provide such fairness differentiation for multimedia flows.
The general assumption underlying all of the above work is that these protocols are "N-TCP-friendly", i.e. they are as TCP-friendly as N TCPs, where N is a positive (and possibly natural) number which is greater than or equal to 1. The MulTFRC protocol, another extension of TFRC for multiple flows, is however able to support values between 0 and 1, making it applicable as a mechanism for a LBE service. Since it does not react to delay like the mechanisms above but adjusts its rate like TFRC, it can probably be expected to be more aggressive than mechanisms such as TCP Nice or TCP-LP. This also means that MulTFRC is less likely to be prone to starvation, as its aggression is tunable at a fine granularity even when N is between 0 and 1.
The mechanism described in controls the bandwidth by letting the receiver intelligently manipulate the receiver window of standard TCP. This is done because the authors assume a client-server setting where the receiver's access link is typically the bottleneck. The scheme incorporates a delay-based calculation of the expected queue length at the bottleneck, which is quite similar to the calculation in the above delay based protocols, e.g. TCP Vegas. Using a Linux implementation, where TCP flows are classified according to their application's needs, it is shown that a significant improvement in packet latency can be attained over an unmodified system while maintaining good link utilization.
Receiver window tuning is also done in , where choosing the right value for the window is phrased as an optimization problem. On this basis, two algorithms are presented, binary search -- which is faster than the other one at achieving a good operation point but fluctuates -- and stochastic optimization, which does not fluctuate but converges slower than binary search. These algorithms merely use the previous receiver window and the amount of data received during the previous control interval as input. According to , the encouraging simulation results suggest that such an application level mechanism can work almost as well as a transport layer scheme like TCP-LP.
TODO: mention other rwnd tuning and different application layer work, e.g. from related work sections of and and intro of .
Various suggestions have been published for realizing a LBE service by influencing the way packets are treated in routers. One example is the Persistent Class Based Queuing (P-CBQ) scheme presented in , which is a variant of Class Based Queuing (CBQ) with per-flow accounting. RFC 3662 defines a DiffServ per-domain behavior called "Lower Effort".
Harp realizes a LBE service by dissipating background traffic to less-utilized paths of the network. This is achieved without changing routers by using edge nodes as relays. According to the authors, these edge nodes should be gateways of organizations in order to align their scheme with usage incentives, but the technical solution would also work if Harp was only deployed in end hosts. It detects impending congestion by looking at delay similar to TCP Nice and manages to improve utilization and fairness over pure single-path solutions.
An entirely different approach is taken in : here, the priority of a flow is reduced via a generic idletime scheduling strategy in a host's operating system. While results presented in this paper show that the new scheduler can effectively shield regular tasks from low-priority ones (e.g. TCP from greedy UDP) with only a minor performance impact, it is an underlying assumption that all involved end hosts would use the idletime scheduler. In other words, it is not the focus of this work to protect a standard TCP flow which originates from any host where the presented scheduling scheme may not be implemented.
TODO: studies dealing with the precision of congestion prediction in end hosts (i.e. using delay to determine the onset of congestion) may be relevant in this document, and could be discussed here, e.g. and the references therein.
The author would like to thank Dragana Damjanovic for reference pointers. Surely lots of other folks will help in one way or another later and I'll thank them all here.
This memo includes no request to IANA.
This document introduces no new security considerations.
TCP Vegas: New techniques for congestion detection and avoidance
Fairness Comparisons Between TCP Reno and TCP Vegas for Future Deployment of TCP Vegas
TCP Vegas revisited
TCP-LP: low-priority service via end-point congestion control
TCP Nice: a mechanism for background transfers
Competitive and Considerate Congestion Control for Bulk Data Transfers
Improving Throughput and Maintaining Fairness using Parallel TCP
A Multipath Background Network Architecture
Lower than best effort: a design and implementation
A Lower Effort Per-Domain Behavior (PDB) for Differentiated Services
This document proposes a differentiated services per-domain behavior (PDB) whose traffic may be "starved" (although starvation is not strictly required) in a properly functioning network. This is in contrast to the Internet's "best-effort" or "normal Internet traffic" model, where prolonged starvation indicates network problems. In this sense, the proposed PDB's traffic is forwarded with a "lower" priority than the normal "best-effort" Internet traffic, thus the PDB is called "Lower Effort" (LE). Use of this PDB permits a network operator to strictly limit the effect of its traffic on "best-effort"/"normal" or all other Internet traffic. This document gives some example uses, but does not propose constraining the PDB's use to any particular type of traffic.
A Lower Effort Per-Domain Behavior (PDB) for Differentiated Services
TCP Extensions for High Performance
University of California Berkeley, Lawrence Berkeley Laboratory
Mail Stop 46A
Berkeley
CA
94720
US
+1 415 486 6411
van@CSAM.LBL.GOV
University of Southern California (USC), Information Sciences Institute
4676 Admiralty Way
Marina del Rey
CA
90292
US
+1 310 822 1511
Braden@ISI.EDU
Cray Research
655-E Lone Oak Drive
Eagan
MN
55121
US
+1 612 683 5571
dab@cray.com
This memo presents a set of TCP extensions to improve performance over large bandwidth*delay product paths and to provide reliable operation over very high-speed paths. It defines new TCP options for scaled windows and timestamps, which are designed to provide compatible interworking with TCP's that do not implement the extensions. The timestamps are used for two distinct mechanisms: RTTM (Round Trip Time Measurement) and PAWS (Protect Against Wrapped Sequences). Selective acknowledgments are not included in this memo.
This memo combines and supersedes RFC-1072 and RFC-1185, adding additional clarification and more detailed specification. Appendix C summarizes the changes from the earlier RFCs.
Emulating Low-Priority Transport at the Application Layer: a Background Transfer Service
Receiver based management of low bandwidth access links
Aggregate congestion control for distributed multimedia applications
MulTFRC: Providing Weighted Fairness for Multimedia Applications (and others too!)
Differentiated end-to-end Internet services using a weighted proportional fair sharing TCP
Probe-Aided MulTCP: an aggregate congestion control mechanism
TCP Friendly Rate Control (TFRC): Protocol Specification
This document specifies TCP Friendly Rate Control (TFRC). TFRC is a congestion control mechanism for unicast flows operating in a best-effort Internet environment. It is reasonably fair when competing for bandwidth with TCP flows, but has a much lower variation of throughput over time compared with TCP, making it more suitable for applications such as streaming media where a relatively smooth sending rate is of importance.</t><t> This document obsoletes RFC 3448 and updates RFC 4342. [STANDARDS TRACK]
Emulating AQM from end hosts
Recommendations on Queue Management and Congestion Avoidance in the Internet
USC Information Sciences Institute
4676 Admiralty Way
Marina del Rey
CA
90292
310-822-1511
Braden@ISI.EDU
MIT Laboratory for Computer Science
545 Technology Sq.
Cambridge
MA
02139
617-253-6003
DDC@lcs.mit.edu
University College London
Department of Computer Science
Gower Street
London, WC1E 6BT
ENGLAND
+44 171 380 7296
Jon.Crowcroft@cs.ucl.ac.uk
Cisco Systems, Inc.
250 Apollo Drive
Chelmsford
MA
01824
bdavie@cisco.com
Cisco Systems, Inc.
170 West Tasman Drive
San Jose
CA
95134-1706
408-527-8213
deering@cisco.com
USC Information Sciences Institute
4676 Admiralty Way
Marina del Rey
CA
90292
310-822-1511
Estrin@usc.edu
Lawrence Berkeley National Laboratory, MS 50B-2239, One Cyclotron Road, Berkeley CA 94720
510-486-7518
Floyd@ee.lbl.gov
Lawrence Berkeley National Laboratory, MS 46A, One Cyclotron Road, Berkeley CA 94720
510-486-7519
Van@ee.lbl.gov
Fiberlane Communications
1399 Charleston Road
Mountain View
CA
94043
+1 650 237 3164
Minshall@fiberlane.com
BBN Technologies
10 Moulton St.
Cambridge MA 02138
510-558-8675
craig@bbn.com
Department of Computer Science
University of Arizona
Tucson
AZ
85721
520-621-4231
LLP@cs.arizona.edu
AT&T Labs. Research
Rm. A155
180 Park Avenue
Florham Park, N.J. 07932
973-360-8766
KKRama@research.att.com
Xerox PARC
3333 Coyote Hill Road
Palo Alto
CA
94304
415-812-4840
Shenker@parc.xerox.com
MIT Laboratory for Computer Science
545 Technology Sq.
Cambridge
MA
02139
617-253-7885
JTW@lcs.mit.edu
UCLA
4531G Boelter Hall
Los Angeles
CA
90024
310-825-2695
Lixia@cs.ucla.edu
Routing
congestion
This memo presents two recommendations to the Internet community
concerning measures to improve and preserve Internet performance.
It presents a strong recommendation for testing, standardization,
and widespread deployment of active queue management in routers,
to improve the performance of today's Internet. It also urges a
concerted effort of research, measurement, and ultimate deployment
of router mechanisms to protect the Internet from flows that are
not sufficiently responsive to congestion notification.