http://summit.ubuntu.com/lpc-2012/ Networking

Wednesday 10:45 - 11:30 PDT
Not Attending CoDel and Queue Limits
Networking Topics 1. CoDel and FQ CoDel 2. Byte Queue Limits revisited === CoDel and FQ CoDel === Kathleen Nichols & Van Jacobson made a major work in the to help the bufferbloat problem that got some attraction these past years after Jim Gettys and Dave Taht communications. The result of this work is presented in following communication[1]. [1] http://queue.acm.org/detail.cfm?id=2209336 Idea of Codel saying that buffers or large Queues are not bad per se (as BufferBloat could imply) Its the way Queues are handled that is critical. AQM are quite complex to deploy because RED (the most known AQM) is hard to tune and has some flaws. CoDel intent is to use the delay of packets in queue as the main input, and not the Queue length (in Bytes or Packets) It is designed to be a "No knob" AQM for routers. Its implementation uses a fairly simple algo intended for silicon integration. Since Kathleen and Van ideas were very close of the various ideas I had last year to improve linux AQM, I co-implemented Codel for linux (with Dave Taht). I also implemented fq_codel, which is SFQRED replacement, using Fair Queueing and CoDel managed per flow queue. I'll present various experimental results. Topic Lead: Eric Dumazet <email address hidden> Eric is currently working for Google. He is a linux networking guy, who did some work in packet schedulers lately: SFQ improvements, and CoDel / fq_codel implementations. === Byte Queue Limits revisited === Byte queue limits (BQL) is an algorithm to manage sizes of queues in network cards based on bytes, not on packets. The algorithm tries to estimate how many bytes the card is able to transmit, sizes the queue accordingly and accommodates to changing load. Properly sized (shorter) queues push queuing to upper layers of the stack, the queuing disciplines (qdisc), which reduces the time between a packet is scheduled for transmission and when it hits the wire. This reduces latency and allows better scheduling decisions in software. BQL is here for a year and it seems to work well given the lack of major complains. In this talk we are going to present experimental data on how BQL actually behaves and what are the effects of BQL buffer management for the rest of the stack. We are going to show that BQL does not need any knobs to selects a good size of the queues. We are also going to discuss and explain some limitations of the algorithm and some corner cases of its deployment due to its dependency on outer events that pace its execution. Topic Lead: Tomas Hruby Tomas is a PhD candidate at the Free University in Amsterdam in the Minix group of prof. Andy Tanenbaum and is exploring how to take advantage of multicore processors for designing reliable systems. He has been working on intrusion detection, filesystems and L4 microkernel. He is currently an intern in the Linux networking team at Google.

Participants:
attending therbert (Tom Herbert)

Tracks:
  • Networking
Nautilus 5
Wednesday 11:40 - 12:25 PDT
Not Attending Data Direct I/O and Ethernet AVB
Networking Topics: 1) Data Direct I/O 2) Ethernet Audio/Video Bridging === Data Direct I/O Significantly Boosts Networking Performance and Reduces Power === This presentation calls out the new Data Direct I/O (DDIO) platform technology that enables I/O data transfers that require far fewer trips to memory (nearly zero in the most optimal scenarios). In doing so, DDIO significantly boosts performance (higher throughput, lower CPU usage, and lower latency), and lowers power consumption. The updated architecture of the Intel Xeon processor to remove the inefficiencies of the classic model by enabling direct communication between Ethernet controllers and adapters and host processor cache. Eliminating the frequent visits to main memory present in the classic model reduces power consumption, provides greater I/O bandwidth scalability, and lowers latency. By avoiding the multiple reads from and writes to system memory, DDIO reduces latency, increases system I/O bandwidth, and reduces power consumption. Intel DDIO is enabled by default on all Intel Xeon processor E5 based servers and workstation platforms. This presentation will explain the technology in detail as well as how it currently gets used. Performance numbers will be included from our Ethernet controllers which will clearly show the benefits of the technology. All performance gains will be examined and explained including the power reduction while increasing the bandwidth as well as reducing latency. === Ethernet Audio/Video Bridging (AVB) - a Proof-of-Concept === Using our latest gigabit Ethernet controller we designed and implemented a Proof-of-Concept Audio Video Bridging device using the IEEE 802.1Qav standard. The project was implemented using a modified Linux igb driver with a user space component to pass the AVB frames to the controller while in addition maintaining normal network connection. This presentation will go through the details of the project, explain the challenges and have a demo of the working implementation at the end. AVB is now being used to pass audio and video to many different types of A/V devices using Ethernet cables instead of having to run large heavy analog A/V cables to the devices. So not only is all the analog cabling gone but the performance is also far superior with the ease of controlling all the audio and video from a single work-station. Topic Lead: John Ronciak John is a SW Architect working for Intel in the LAN Access Division (LAD). John has 30 years experience writing devices drivers for various operating system and is currently one of the leads in the Open Source driver group responsible for six Linux kernel drivers.

Participants:
attending therbert (Tom Herbert)

Tracks:
  • Networking
Nautilus 5
Thursday 10:25 - 11:10 PDT
Not Attending Multipath TCP && TCP Loss Probe && Client Congestion Manager
Networking Topics: 1. Linux Kernel Implementation of Multipath TCP 2. TCP Loss Probe (TLP): fast recovery for tail losses 3. Client-based Congestion Manager for TCP === Linux Kernel Implementation of Multipath TCP === MultiPath TCP (short MPTCP) is an extension to TCP that allows a single TCP-connection to be split among multiple interfaces, while presenting a standard TCP-socket API to the applications. Splitting a data-stream among different interfaces has multiple benefits. Data-center hosts may increase their bandwidth; smartphones with WiFi/3G may seamlessly handover traffic from 3G to WiFi,... MultiPath TCP works with unmodified applications over today's Internet with all its middleboxes and firewalls. A recent Google Techtalk about MultiPath TCP is available at [1] In this talk I will first present the basics of MultiPath TCP and how it works and show some of the performance results we obtained with our Linux Kernel implementation (freely available at [2]). Second, I will go into the details of our implementation in the Linux Kernel, and our plans to try submitting the MPTCP-patches to the upstream Linux Kernel. [1] http://www.youtube.com/watch?v=02nBaaIoFWU [2] http://mptcp.info.ucl.ac.be Topic Lead: Christoph Paasch <email address hidden> === TCP Loss Probe (TLP): fast recovery for tail losses === Fast recovery (FR) and retransmission timeouts (RTOs) are two mechanisms in TCP for detecting and recovering from packet losses. Fast recovery detects and repairs losses quicker than RTOs, however, it is only triggered when connections have a sufficiently large number of packets in transit. Short flows, such as the vast majority of Web transfers, are more likely to detect losses via RTOs which are expensive in terms of latency. While a single packet loss in a 1000 packet flow can be repaired within a round-trip time (RTT) by FR, the same loss in a one packet flow takes many RTTs to even detect. The problem is not just limited to short flows, but more generally losses near the end of transfers, aka tail losses, can only be recovered via RTOs. In this talk, I will describe TCP Loss Probe (TLP) - a mechanism that allows flows to detect and recover from tail losses much faster than an RTO, thereby speeding up short transfers. TLP also unifies the loss recovery regardless of the "position" of a loss, e.g., a packet loss in the middle of a packet train as well as at the tail end will now trigger the same fast recovery mechanisms. I will also describe experimental results with TLP and its impact on Web transfer latency on live traffic. Topic Lead: Nandita Dukkipati <email address hidden> Nandita is a software engineer at Google working on making Networking faster for Web traffic and Datacenter applications. She is an active participant at the IETF and in networking research. Prior to Google she obtained a PhD in Electrical Engineering, Stanford University. === Client-based Congestion Manager for TCP === Today, one of the most effective ways to improve the performance of chatty applications is to keep TCP connection open as long as possible to save the overhead of SYN exchange and slow start on later requests. However, due to Web domain sharing, NAT boxes often run out of ports or other resources and resort to dropping connections in ways that make later connections even slower to start. A better solution would be to enable TCP to start a new connection as quickly as restarting an idle connection. The approach is to have a congestion manager (CM) on the client that constantly learns about the network and adds some signaling information to requests from the client, indicating how the server can reply most quickly, for example by providing TCP metrics similar to today's destination cache. Such a CM could even indicate to the server what type of congestion control to use, such as the relentless congestion control algorithm such that opening more connections does not gain advantage on aggregate throughput. It also allows receiver-based congestion control which opens new possibilities to control congestion. The Linux TCP metrics have similar concept but there is a lot of room for improvement. Topic Lead: Yuchung Cheng <email address hidden> Yuchung Cheng is a software engineer at Google working on the Make-The-Web-Faster project. He works on the TCP protocol and the Linux TCP stack focusing on latency. He has contributed Fast Open, Proportional Rate Reduction, Early Retransmit implementation in Linux kernel and wrote a few papers and IETF drafts of them. He has also contributed to rate-limiting Youtube streaming and the cwnd-persist feature of the SPDY protocol.

Participants:
attending therbert (Tom Herbert)

Tracks:
  • Networking
Nautilus 2
Thursday 11:20 - 12:05 PDT
Not Attending Multipath TCP && TCP Loss Probe && Client Congestion Manager
Networking Topics: 1. Linux Kernel Implementation of Multipath TCP 2. TCP Loss Probe (TLP): fast recovery for tail losses 3. Client-based Congestion Manager for TCP === Linux Kernel Implementation of Multipath TCP === MultiPath TCP (short MPTCP) is an extension to TCP that allows a single TCP-connection to be split among multiple interfaces, while presenting a standard TCP-socket API to the applications. Splitting a data-stream among different interfaces has multiple benefits. Data-center hosts may increase their bandwidth; smartphones with WiFi/3G may seamlessly handover traffic from 3G to WiFi,... MultiPath TCP works with unmodified applications over today's Internet with all its middleboxes and firewalls. A recent Google Techtalk about MultiPath TCP is available at [1] In this talk I will first present the basics of MultiPath TCP and how it works and show some of the performance results we obtained with our Linux Kernel implementation (freely available at [2]). Second, I will go into the details of our implementation in the Linux Kernel, and our plans to try submitting the MPTCP-patches to the upstream Linux Kernel. [1] http://www.youtube.com/watch?v=02nBaaIoFWU [2] http://mptcp.info.ucl.ac.be Topic Lead: Christoph Paasch <email address hidden> === TCP Loss Probe (TLP): fast recovery for tail losses === Fast recovery (FR) and retransmission timeouts (RTOs) are two mechanisms in TCP for detecting and recovering from packet losses. Fast recovery detects and repairs losses quicker than RTOs, however, it is only triggered when connections have a sufficiently large number of packets in transit. Short flows, such as the vast majority of Web transfers, are more likely to detect losses via RTOs which are expensive in terms of latency. While a single packet loss in a 1000 packet flow can be repaired within a round-trip time (RTT) by FR, the same loss in a one packet flow takes many RTTs to even detect. The problem is not just limited to short flows, but more generally losses near the end of transfers, aka tail losses, can only be recovered via RTOs. In this talk, I will describe TCP Loss Probe (TLP) - a mechanism that allows flows to detect and recover from tail losses much faster than an RTO, thereby speeding up short transfers. TLP also unifies the loss recovery regardless of the "position" of a loss, e.g., a packet loss in the middle of a packet train as well as at the tail end will now trigger the same fast recovery mechanisms. I will also describe experimental results with TLP and its impact on Web transfer latency on live traffic. Topic Lead: Nandita Dukkipati <email address hidden> Nandita is a software engineer at Google working on making Networking faster for Web traffic and Datacenter applications. She is an active participant at the IETF and in networking research. Prior to Google she obtained a PhD in Electrical Engineering, Stanford University. === Client-based Congestion Manager for TCP === Today, one of the most effective ways to improve the performance of chatty applications is to keep TCP connection open as long as possible to save the overhead of SYN exchange and slow start on later requests. However, due to Web domain sharing, NAT boxes often run out of ports or other resources and resort to dropping connections in ways that make later connections even slower to start. A better solution would be to enable TCP to start a new connection as quickly as restarting an idle connection. The approach is to have a congestion manager (CM) on the client that constantly learns about the network and adds some signaling information to requests from the client, indicating how the server can reply most quickly, for example by providing TCP metrics similar to today's destination cache. Such a CM could even indicate to the server what type of congestion control to use, such as the relentless congestion control algorithm such that opening more connections does not gain advantage on aggregate throughput. It also allows receiver-based congestion control which opens new possibilities to control congestion. The Linux TCP metrics have similar concept but there is a lot of room for improvement. Topic Lead: Yuchung Cheng <email address hidden> Yuchung Cheng is a software engineer at Google working on the Make-The-Web-Faster project. He works on the TCP protocol and the Linux TCP stack focusing on latency. He has contributed Fast Open, Proportional Rate Reduction, Early Retransmit implementation in Linux kernel and wrote a few papers and IETF drafts of them. He has also contributed to rate-limiting Youtube streaming and the cwnd-persist feature of the SPDY protocol.

Participants:
attending therbert (Tom Herbert)

Tracks:
  • Networking
Nautilus 2
Friday 09:10 - 09:55 PDT
Not Attending Classification/Shaping && HW Rate Limiting && Open-vswitch
Networking Bufferbloat Topics 1. Linux Traffic Classification and Shaping 2. TC Interface to Hardware Rate Limiting 3. Harmonizing Multiqueue, Vmdq, virtio-net, macvtap with open-vswitch === Linux Traffic classification and Shaping === Linux provides advanced mechanism for traffic classification and shaping. Central to this role is the queuing discipline. Recently we have done work allowing hardware to offload some of these traditionally CPU intensive task and have experimented with mechanisms to improve performance on many-core systems. Here we would like to highlight work such as the queuing scheduler 'mqprio' that have recently been accepted upstream. As well as share results from experimental work running lockless queuing disciplines and classifiers on many-core systems and fat pipes (10Gbps and greater). Topic Lead: Tom Herbert === tc(?) interface to hardware transmit rate limiting === Intel 10 Gigabit hardware (and others) can provide transmit rate limiting. This presentation will discuss development of a new simple qdisc that can either provide all-software transmit rate limiting, or when installed over hardware that supports the capability, can directly configure the hardware's rate limiting. One problem that will need discussion is that the Intel hardware's rate limiting is per-queue. Another option besides a qdisc that could be discussed is direct ethtool control over the rate limiting. Topic Lead: Jesse Brandeburg Jesse is a senior Linux developer in the Intel LAN Access Division (Intel Ethernet). He has been with Intel since 1994, and has worked on the Linux e100, e1000, e1000e, igb, ixgb, ixgbe drivers since 2002. His time is split between solving customer issues, performance tuning Intel's drivers, and working on bleeding edge development for the Linux networking stack. === Harmonizing Multiqueue, Vmdq, virtio-net, macvtap with open-vswitch === Multiqueue virtio-net, macvtap and qemu is being worked upon by Jason Wang and Krishna Kumar. Inspired by their work I had like to extend it a step further and discuss introducing open-vswitch based flows for multiqueue aware virtio-net queuing. This requires plumbing in openvswitch to utilize linux tc to instantiate QoS flows per queue in addition to the virtio-net multiqueue work. Open-vswitch also needs to incorporate support for opening tap fds multiple times so it can create as many queues. To this end openvswitch might want to become macvtap aware. There is a need to understand and discuss gaps in realizing openvswitch usecases in synchronization with features already implemented in macvtap and linux tc. For instance.. features like vepa, veb etc are implemented in the macvtap/macvlan driver only but are useful for openvswitch based flows too. I had like to discuss features/gaps that require plumbing in these subsystems and related work. *Required attendees(If present)* Developers like Jason Wang, Krishna Kumar, Michael Tsirkin, Arnd Bergmann, Stephen Hemminger, Dave Miller, open-vswitch developers, netdev developers, libvirt developers, qemu developers Topic Lead: Shyam Iyer <email address hidden> Shyam Iyer is a senior software engineer in Dell's Operating Sytems Advanced Engineering Group focused on Linux with over 8 years of experience in developing linux based solutions. Apart from enabling Dell PowerEdge Servers and Storage for Enterprise Linux Operating Systems he focuses on bridging new hardware technology usecases with emerging new Linux technologies. His interests encompass Server Hardware Architecture, Linux Kernel Debugging, Server Platform bringup, efficient storage, networking, Virtualization architectures and performance tuning.

Participants:
attending therbert (Tom Herbert)

Tracks:
  • Networking
Nautilus 5
Friday 10:05 - 10:50 PDT
Not Attending Classification/Shaping && HW Rate Limiting && Open-vswitch
Networking Bufferbloat Topics 1. Linux Traffic Classification and Shaping 2. TC Interface to Hardware Rate Limiting 3. Harmonizing Multiqueue, Vmdq, virtio-net, macvtap with open-vswitch === Linux Traffic classification and Shaping === Linux provides advanced mechanism for traffic classification and shaping. Central to this role is the queuing discipline. Recently we have done work allowing hardware to offload some of these traditionally CPU intensive task and have experimented with mechanisms to improve performance on many-core systems. Here we would like to highlight work such as the queuing scheduler 'mqprio' that have recently been accepted upstream. As well as share results from experimental work running lockless queuing disciplines and classifiers on many-core systems and fat pipes (10Gbps and greater). Topic Lead: Tom Herbert === tc(?) interface to hardware transmit rate limiting === Intel 10 Gigabit hardware (and others) can provide transmit rate limiting. This presentation will discuss development of a new simple qdisc that can either provide all-software transmit rate limiting, or when installed over hardware that supports the capability, can directly configure the hardware's rate limiting. One problem that will need discussion is that the Intel hardware's rate limiting is per-queue. Another option besides a qdisc that could be discussed is direct ethtool control over the rate limiting. Topic Lead: Jesse Brandeburg Jesse is a senior Linux developer in the Intel LAN Access Division (Intel Ethernet). He has been with Intel since 1994, and has worked on the Linux e100, e1000, e1000e, igb, ixgb, ixgbe drivers since 2002. His time is split between solving customer issues, performance tuning Intel's drivers, and working on bleeding edge development for the Linux networking stack. === Harmonizing Multiqueue, Vmdq, virtio-net, macvtap with open-vswitch === Multiqueue virtio-net, macvtap and qemu is being worked upon by Jason Wang and Krishna Kumar. Inspired by their work I had like to extend it a step further and discuss introducing open-vswitch based flows for multiqueue aware virtio-net queuing. This requires plumbing in openvswitch to utilize linux tc to instantiate QoS flows per queue in addition to the virtio-net multiqueue work. Open-vswitch also needs to incorporate support for opening tap fds multiple times so it can create as many queues. To this end openvswitch might want to become macvtap aware. There is a need to understand and discuss gaps in realizing openvswitch usecases in synchronization with features already implemented in macvtap and linux tc. For instance.. features like vepa, veb etc are implemented in the macvtap/macvlan driver only but are useful for openvswitch based flows too. I had like to discuss features/gaps that require plumbing in these subsystems and related work. *Required attendees(If present)* Developers like Jason Wang, Krishna Kumar, Michael Tsirkin, Arnd Bergmann, Stephen Hemminger, Dave Miller, open-vswitch developers, netdev developers, libvirt developers, qemu developers Topic Lead: Shyam Iyer <email address hidden> Shyam Iyer is a senior software engineer in Dell's Operating Sytems Advanced Engineering Group focused on Linux with over 8 years of experience in developing linux based solutions. Apart from enabling Dell PowerEdge Servers and Storage for Enterprise Linux Operating Systems he focuses on bridging new hardware technology usecases with emerging new Linux technologies. His interests encompass Server Hardware Architecture, Linux Kernel Debugging, Server Platform bringup, efficient storage, networking, Virtualization architectures and performance tuning.

Participants:
attending therbert (Tom Herbert)

Tracks:
  • Networking
Nautilus 5

PLEASE NOTE The Linux Plumbers Conference 2012 schedule is still in a draft format and is subject to changes at any time.