Optimize Wireless performance in Win CE
Wireless – WiFi (802.11 b/g/n) is the preferred wireless networking world over and there are numerous windows embedded products which use the 802.11 networking. Developing or porting a device driver for these 802.11 b/g/n devices is a challenging task and we at e-con Systems have been doing this for a number of different products based on Windows CE.
Based on our experience the most time consuming /complex part is optimizing the performance of the driver. There are many aspects that play a role in deciding the final wireless throughput and performance
What best can you expect at all?
With 802.11g being marketed as 54Mbps and 802.11n marketed as 600 Mbps, people have set high expectations for their data throughput as well. This has made many expect a near 54Mbps performance from the G module, when they integrate their wireless driver..
Well.. That expectation is far from practical..
Normally the IP throughput of a wireless network is near 60% of the air link throughput as the 802.11 is a half duplex implementation and adding to that there are so many other reasons like
1. Interference (other devices in the same frequency)
2. Distance and Obstructions
3. Driver Implementation
4. Hardware capability
5. Presence of Access Point between 2 stations
The explanation for practical wireless expectation is in itself a subject of an article or blog.
Based on experience, I would state the MAX performance you can expect from your wireless device (point to point) is
802.11b – 3 Mbps
802.11g – 17-20 Mbps
802.11n – 45-50 Mbps (strictly N network)
This article concentrates more on what to look for in the Windows CE WiFi Driver and how you can optimize the performance
Before starting the performance fine tuning, it is important to determine the parameters that affect the target performance in number. From a device driver perspective, the actual performance that could be attained depend on the platform.
1. Processor Speed (1GHz or 2 GHz, single core or multi core)
2. Wi-Fi Interface (SD or PCIe or SPI)
3. Interface Clock
4. NIC Performance
Based on the interface and processor, we can determine the performance to be achieved. For example, the 4-bit SD interface supporting 25MHz on the 1 GHz processor should be able to provide 90 to 100 Mbps of performance. With the same processor, using PCIe interface of 125 MHz should provide around 350 to 400Mbits thorough put.
The second important point before starting performance fine tuning is to know how to measure the send and receive throughput of the Wi-Fi driver. The Compact Test Kit (CTK) tool supports the following test cases to measure the performance of miniport driver.
|Test Case ID||Test case|
|1001||TCP Send Throughput|
|1002||TCP Receive Throughput|
|1004||TCP Send Throughput Nagle Off|
|1005||TCP Receive Throughput Nagle Off|
|1006||TCP Ping Nagle Off|
|1007||UDP Send throughput|
|1008||UDP Send Packet Loss|
|1009||UDP Receive Throughput|
|1010||UDP Receive Packet Loss|
|1012||TCP Send/Receive Throughput|
|1013||UDP Send/ Receive Throughput|
1. Performance Optimization Techniques
1. NDIS 6.0 vs. NDIS 5.1
Windows Embedded Compact supports NDIS 6.0 architecture Wi-Fi miniport driver. It is very important to adhere the NDIS 6.0 architecture to get the high bandwidth performance as the following new features are supported by NDIS 6.0 to enhance the performance and scalability.
1. Net Buffer Data Packaging
2. Improved Send and Receive Paths
3. New Scatter Gather DMA support
4. Full TCP Offload
2. DMA and Scatter gather DMA
For high bandwidth performance, a combination of shared memory and DMA buffer with scatter gather seems to work fine. The buffer copying between layers of protocols is one of the problems for network data processing. So for the send side having several packets to DMA out using scatter gather approach helps the performance for example. NDIS 6.0 has a better support for SGDMA (scatter gather DMA). So if the NIC supports the SGGMA, it is important to enable.
It is a good idea to estimate buffer requirements for occasional traffic burst. So measuring the average number of times when your NIC does not get back the ownership of receive buffer in time from protocol layers is vital to provide a good estimate of reserved buffer for burst traffics.
The driver in its interrupt service routine (ISR) will process the interrupt and checks the available packet descriptors to be processed. And as it processes the descriptors, it has the opportunity to hand the packets to the driver’s DPC and indicate them to high level filter and protocol layers.
Since the miniport will have a MiniportInterrupt routine that gets called by NDIS for interrupt processing, the execution path follows from this routine, including the DPC routine should have the necessary synchronization to access any resources shared by multiple concurrent paths of driver execution. The deferred procedure call (DPC) processing should try to process as many packets as possible within DPC processing time limit. Note that if the driver’s DPC processing takes too long, it will make the system unresponsive for other applications.
It is not a good idea to indicate packets to the NDIS and ask the protocol to release them immediately. This will incur a fairly heavy overhead in protocol processing layer due to buffer copying. So it is better to have a pool of buffers mainly from non-paged system pool. We want to avoid probing and locking buffers while processing receive packets.
Host Driver Tuning
To get the higher bandwidth, the host driver performance must be tuned for the higher performance.
Utilizing Benefits of NIC
1. Protocol Processing Offload
Nowadays NIC are supporting the following offload mechanisms.
1. TCP/UDP checksum offload
2. Segmentation of Large TCP packets
As the above off load mechanisms are supported with Windows Embedded NDIS architecture, enabling these offloads on the NIC would reduce the CPU overhead and thus improving the overall bandwidth.
1. Jumbo Frames
Jumbo frames are Ethernet frames with more than 1500 bytes of payload. Conventionally jumbo frames carry up to 9000 bytes of payload but variations exist. Each received Ethernet frame requires that the network hardware and software process it. Increasing the frame size makes a certain large amount of data transferable with less effort, reducing CPU utilization (mostly due to interrupt reduction) and increasing throughput by reducing the number of frames needing processing and reducing the total overhead byte count of all the frames sent. So if supported by the NIC, using jumbo frames certainly increases the throughput.
2. Interrupt Moderation
To reduce the number of interrupts, many NICs use interrupt moderation. With interrupt moderation, the NIC hardware will not generate an interrupt immediately after it receives a packet. Instead, the hardware waits for more packets to arrive, or for a time-out to expire, before generating an interrupt. The hardware vendor specifies the maximum number of packets, time-out interval, or other interrupt moderation algorithm.
If supported by the NIC, it is important to enable the interrupt moderation as it reduces the number of interrupts processed and also allows the interrupt routine to process more number descriptors at a time thus reducing the CPU overhead.