| Foreword |
|
xix | |
| Preface |
|
xxi | |
|
Introduction And Overview |
|
|
1 | (6) |
|
Network Systems And The Internet |
|
|
1 | (1) |
|
Applications Vs. Infrastructure |
|
|
1 | (1) |
|
Network Systems Engineering |
|
|
2 | (1) |
|
|
|
2 | (1) |
|
|
|
3 | (1) |
|
|
|
3 | (1) |
|
Hardware, Software, And Hybrids |
|
|
4 | (1) |
|
Scope And Organization Of The Text |
|
|
5 | (1) |
|
|
|
5 | (2) |
|
Basic Terminology And Example Systems |
|
|
7 | (8) |
|
|
|
7 | (1) |
|
|
|
7 | (1) |
|
Connection-Oriented And Connectionless Paradigms |
|
|
8 | (1) |
|
|
|
8 | (1) |
|
LAN And WAN Classifications |
|
|
9 | (1) |
|
The Internet And Heterogeneity |
|
|
9 | (1) |
|
|
|
9 | (1) |
|
|
|
10 | (1) |
|
The Two Key Systems Used In The Internet |
|
|
11 | (1) |
|
Other Systems Used In The Internet |
|
|
12 | (1) |
|
Monitoring And Control Systems |
|
|
13 | (1) |
|
|
|
13 | (2) |
|
Review Of Protocols And Packet Formats |
|
|
15 | (14) |
|
|
|
15 | (1) |
|
|
|
15 | (2) |
|
Layers 1 And 2 (Physical And Network Interface) |
|
|
17 | (2) |
|
|
|
19 | (1) |
|
|
|
20 | (3) |
|
Protocol Port Numbers And Demultiplexing |
|
|
23 | (1) |
|
Encapsulation And Transmission |
|
|
23 | (1) |
|
Address Resolution Protocol |
|
|
24 | (1) |
|
|
|
24 | (5) |
|
PART I Traditional Protocol Processing Systems |
|
|
|
Conventional Computer Hardware Architecture |
|
|
29 | (14) |
|
|
|
29 | (1) |
|
A Conventional Computer System |
|
|
29 | (1) |
|
|
|
30 | (1) |
|
|
|
31 | (1) |
|
|
|
32 | (1) |
|
|
|
33 | (1) |
|
Network Interface Card Functionality |
|
|
34 | (1) |
|
NIC Optimizations For High Speed |
|
|
34 | (1) |
|
Onboard Address Recognition |
|
|
35 | (1) |
|
|
|
36 | (1) |
|
|
|
37 | (1) |
|
Operation And Data Chaining |
|
|
38 | (1) |
|
|
|
39 | (1) |
|
|
|
39 | (1) |
|
|
|
40 | (3) |
|
Basic Packet Processing: Algorithms And Data Structures |
|
|
43 | (24) |
|
|
|
43 | (1) |
|
State Information and Resource Exhaustion |
|
|
43 | (1) |
|
|
|
44 | (1) |
|
Packet Buffer Size And Copying |
|
|
45 | (1) |
|
Protocol Layering And Copying |
|
|
45 | (1) |
|
Heterogeneity And Network Byte Order |
|
|
46 | (1) |
|
|
|
47 | (2) |
|
|
|
49 | (1) |
|
IP Datagram Fragmentation And Reassembly |
|
|
50 | (6) |
|
|
|
56 | (1) |
|
|
|
57 | (1) |
|
|
|
57 | (2) |
|
TCP Connection Recognition Algorithm |
|
|
59 | (1) |
|
|
|
60 | (3) |
|
|
|
63 | (4) |
|
Packet Processing Functions |
|
|
67 | (16) |
|
|
|
67 | (1) |
|
|
|
68 | (1) |
|
Address Lookup And Packet Forwarding |
|
|
68 | (1) |
|
Error Detection And Correction |
|
|
69 | (1) |
|
Fragmentation, Segmentation, And Reassembly |
|
|
70 | (1) |
|
Frame And Protocol Demultiplexing |
|
|
70 | (1) |
|
|
|
71 | (2) |
|
Queueing And Packet Discard |
|
|
73 | (2) |
|
|
|
75 | (1) |
|
Security: Authentication And Privacy |
|
|
76 | (1) |
|
Traffic Measurement And Policing |
|
|
76 | (1) |
|
|
|
77 | (2) |
|
|
|
79 | (1) |
|
|
|
80 | (3) |
|
Protocol Software On A Conventional Processor |
|
|
83 | (14) |
|
|
|
83 | (1) |
|
Implementation Of Packet Processing In An Application |
|
|
83 | (1) |
|
Fast Packet Processing In Software |
|
|
84 | (1) |
|
|
|
84 | (1) |
|
Operating System Implementations |
|
|
85 | (1) |
|
Software Interrupts And Priorities |
|
|
85 | (2) |
|
Multiple Priorities And Kernel Threads |
|
|
87 | (1) |
|
|
|
88 | (1) |
|
Software For Layered Protocols |
|
|
88 | (4) |
|
Asynchronous Vs. Synchronous Programming |
|
|
92 | (1) |
|
|
|
93 | (4) |
|
Hardware Architectures For Protocol Processing |
|
|
97 | (18) |
|
|
|
97 | (1) |
|
Network Systems Architecture |
|
|
97 | (1) |
|
The Traditional Software Router |
|
|
98 | (1) |
|
|
|
99 | (1) |
|
|
|
99 | (2) |
|
Packet Rate And Software Router Feasibility |
|
|
101 | (2) |
|
Overcoming The Single CPU Bottleneck |
|
|
103 | (1) |
|
|
|
104 | (1) |
|
Symmetric Coarse-Grain Parallelism |
|
|
104 | (1) |
|
Asymmetric Coarse-Grain Parallelism |
|
|
105 | (1) |
|
Special-Purpose Coprocessors |
|
|
105 | (1) |
|
ASIC Coprocessor Implementation |
|
|
106 | (1) |
|
NICs With Onboard Processing |
|
|
107 | (1) |
|
Smart NICs With Onboard Stacks |
|
|
108 | (1) |
|
Cells And Connection-Oriented Addressing |
|
|
108 | (1) |
|
|
|
109 | (2) |
|
|
|
111 | (4) |
|
Classification And Forwarding |
|
|
115 | (18) |
|
|
|
115 | (1) |
|
Inherent Limits Of Demultiplexing |
|
|
115 | (1) |
|
|
|
116 | (1) |
|
Software Implementation Of Classification |
|
|
117 | (1) |
|
Optimizing Software-Based Classification |
|
|
118 | (1) |
|
Software Classification On Special-Purpose Hardware |
|
|
119 | (1) |
|
Hardware Implementation Of Classification |
|
|
119 | (1) |
|
Optimized Classification Of Multiple Rule Sets |
|
|
120 | (2) |
|
Classification Of Variable-Size Headers |
|
|
122 | (1) |
|
Hybrid Hardware/Software Classification |
|
|
123 | (1) |
|
Dynamic Vs. Static Classification |
|
|
124 | (1) |
|
|
|
125 | (1) |
|
Flow Forwarding In A Connection-Oriented Network |
|
|
126 | (1) |
|
Connectionless Network Classification And Forwarding |
|
|
126 | (1) |
|
Second Generation Network Systems |
|
|
127 | (1) |
|
Embedded Processors In Second Generation Systems |
|
|
128 | (1) |
|
Classification And Forwarding Chips |
|
|
129 | (1) |
|
|
|
130 | (3) |
|
|
|
133 | (20) |
|
|
|
133 | (1) |
|
Bandwidth Of An Internal Fast Path |
|
|
133 | (1) |
|
The Switching Fabric Concept |
|
|
134 | (1) |
|
Synchronous And Asynchronous Fabrics |
|
|
135 | (1) |
|
A Taxonomy Of Switching Fabric Architectures |
|
|
136 | (1) |
|
Dedicated Internal Paths And Port Contention |
|
|
136 | (1) |
|
|
|
137 | (2) |
|
|
|
139 | (2) |
|
Time Division Solutions: Sharing Data Paths |
|
|
141 | (1) |
|
|
|
141 | (1) |
|
Other Shared Medium Architectures |
|
|
142 | (1) |
|
Shared Memory Architecture |
|
|
143 | (1) |
|
|
|
144 | (1) |
|
|
|
145 | (1) |
|
|
|
146 | (2) |
|
|
|
148 | (1) |
|
|
|
148 | (5) |
|
PART II Network Processor Technology |
|
|
|
Network Processors: Motivation And Purpose |
|
|
153 | (12) |
|
|
|
153 | (1) |
|
The CPU In A Second Generation Architecture |
|
|
153 | (1) |
|
Third Generation Network Systems |
|
|
154 | (1) |
|
The Motivation For Embedded Processors |
|
|
155 | (1) |
|
|
|
155 | (1) |
|
The Need For Custom Silicon |
|
|
156 | (1) |
|
Definition Of A Network Processor |
|
|
157 | (1) |
|
A Fundamental Idea: Flexibility Through Programmability |
|
|
158 | (1) |
|
|
|
159 | (1) |
|
Scalability With Parallelism And Pipelining |
|
|
159 | (1) |
|
The Costs And Benefits Of Network Processors |
|
|
160 | (1) |
|
Network Processors And The Economics Of Success |
|
|
161 | (1) |
|
The Status And Future Of Network Processors |
|
|
162 | (1) |
|
|
|
162 | (3) |
|
The Complexity Of Network Processor Design |
|
|
165 | (12) |
|
|
|
165 | (1) |
|
Network Processor Functionality |
|
|
165 | (1) |
|
Packet Processing Functions |
|
|
166 | (1) |
|
Ingress And Egress Processing |
|
|
167 | (3) |
|
Parallel And Distributed Architecture |
|
|
170 | (1) |
|
The Architectural Roles Of Network Processors |
|
|
171 | (1) |
|
Consequences For Each Architectural Role |
|
|
171 | (2) |
|
Macroscopic Data Pipelining And Heterogeneity |
|
|
173 | (1) |
|
Network Processor Design And Software Emulation |
|
|
173 | (1) |
|
|
|
174 | (3) |
|
Network Processor Architectures |
|
|
177 | (18) |
|
|
|
177 | (1) |
|
|
|
177 | (1) |
|
Primary Architectural Characteristics |
|
|
178 | (8) |
|
Architecture, Packet Flow, And Clock Rates |
|
|
186 | (3) |
|
|
|
189 | (1) |
|
Assigning Functionality To The Processor Hierarchy |
|
|
189 | (2) |
|
|
|
191 | (4) |
|
Issues In Scaling A Network Processor |
|
|
195 | (18) |
|
|
|
195 | (1) |
|
The Processing Hierarchy And Scaling |
|
|
195 | (1) |
|
Scaling By Making Processors Faster |
|
|
196 | (1) |
|
Scaling By Increasing The Number of Processors |
|
|
196 | (1) |
|
Scaling By Increasing Processor Types |
|
|
197 | (1) |
|
Scaling A Memory Hierarchy |
|
|
198 | (2) |
|
Scaling By Increasing Memory Size |
|
|
200 | (1) |
|
Scaling By Increasing Memory Bandwidth |
|
|
200 | (1) |
|
Scaling By Increasing Types Of Memory |
|
|
201 | (1) |
|
Scaling By Adding Memory Caches |
|
|
202 | (1) |
|
Scaling With Content Addressable Memory |
|
|
203 | (2) |
|
Using CAM for Packet Classification |
|
|
205 | (2) |
|
Other Limitations On Scale |
|
|
207 | (1) |
|
|
|
208 | (1) |
|
|
|
209 | (1) |
|
|
|
209 | (4) |
|
Examples Of Commercial Network Processors |
|
|
213 | (20) |
|
|
|
213 | (1) |
|
An Explosion Of Commercial Products |
|
|
213 | (1) |
|
|
|
214 | (1) |
|
Two-Stage Pipeline (Agere) |
|
|
214 | (2) |
|
Augmented RISC Processor (Alchemy) |
|
|
216 | (2) |
|
Embedded Processor Plus Coprocessors (AMCC) |
|
|
218 | (1) |
|
Pipeline Of Homogeneous Processors (Cisco) |
|
|
219 | (1) |
|
Pipeline Of Heterogeneous Processors (EZchip) |
|
|
220 | (2) |
|
Extensive And Diverse Processors (Hifn) |
|
|
222 | (2) |
|
Flexible RISC Plus Coprocessors (Motorola) |
|
|
224 | (4) |
|
Extremely Long Homogeneous Pipeline (Xelerated) |
|
|
228 | (1) |
|
|
|
228 | (5) |
|
Design Tradeoffs And Consequences |
|
|
233 | (12) |
|
|
|
233 | (1) |
|
Low Development Cost Vs. Performance |
|
|
233 | (1) |
|
Programmability Vs. Processing Speed |
|
|
234 | (1) |
|
Performance: Packet Rate, Data Rate, And Bursts |
|
|
234 | (1) |
|
|
|
235 | (1) |
|
Per-Interface Rate Vs. Aggregate Data Rate |
|
|
235 | (1) |
|
Network Processor Speed Vs. Bandwidth |
|
|
235 | (1) |
|
Coprocessor Design: Lookaside Vs. Flow-Through |
|
|
236 | (1) |
|
Pipelining: Uniform Vs. Synchronized |
|
|
236 | (1) |
|
Explicit Parallelism Vs. Cost And Programmability |
|
|
236 | (1) |
|
Parallelism: Scale Vs. Packet Ordering |
|
|
237 | (1) |
|
Parallelism: Speed Vs. Stateful Classification |
|
|
237 | (1) |
|
Memory: Speed Vs. Programmability |
|
|
237 | (1) |
|
I/O Performance Vs. Pin Count |
|
|
238 | (1) |
|
Programming Languages: A Three-Way Tradeoff |
|
|
238 | (1) |
|
Multithreading: Throughput Vs. Programmability |
|
|
238 | (1) |
|
Traffic Management Vs. Blind Forwarding At Low Cost |
|
|
239 | (1) |
|
Generality Vs. Specific Architectural Role |
|
|
239 | (1) |
|
Memory Type: Special-Purpose Vs. General-Purpose |
|
|
239 | (1) |
|
Backward Compatibility Vs. Architectural Advances |
|
|
240 | (1) |
|
Parallelism Vs. Pipelining |
|
|
240 | (1) |
|
|
|
241 | (4) |
|
PART III Example Network Processor |
|
|
|
Overview Of The Intel Network Processor |
|
|
245 | (16) |
|
|
|
245 | (1) |
|
|
|
245 | (1) |
|
IXA: Internet Exchange Architecture |
|
|
246 | (1) |
|
IXP: Internet Exchange Processor |
|
|
246 | (1) |
|
|
|
247 | (1) |
|
|
|
248 | (2) |
|
|
|
250 | (2) |
|
IXP2xxx Processor Hierarchy |
|
|
252 | (2) |
|
|
|
254 | (2) |
|
Word And Longword Accesses |
|
|
256 | (1) |
|
An Example Of Underlying Complexity |
|
|
257 | (1) |
|
Other Hardware Facilities |
|
|
258 | (1) |
|
|
|
258 | (3) |
|
Embedded RISC Processor (XScale Core) |
|
|
261 | (12) |
|
|
|
261 | (1) |
|
Purpose Of An Embedded Processor |
|
|
261 | (2) |
|
|
|
263 | (1) |
|
RISC Instruction Set And Registers |
|
|
264 | (1) |
|
XScale Memory Architecture |
|
|
264 | (1) |
|
|
|
265 | (1) |
|
Virtual Address Space And Memory Management |
|
|
265 | (1) |
|
Shared Memory And Address Translation |
|
|
266 | (1) |
|
Internal Peripheral Units |
|
|
267 | (1) |
|
|
|
268 | (1) |
|
User And Kernel Mode Operation |
|
|
268 | (1) |
|
|
|
269 | (1) |
|
|
|
269 | (4) |
|
Packet Processor Hardware (Microengines) |
|
|
273 | (28) |
|
|
|
273 | (1) |
|
The Purpose Of Microengines |
|
|
273 | (1) |
|
|
|
274 | (1) |
|
The Concept Of Microsequencing |
|
|
274 | (1) |
|
Microengine Instruction Set |
|
|
275 | (2) |
|
Separate Memory Address Spaces |
|
|
277 | (1) |
|
|
|
278 | (1) |
|
The Concept Of Instruction Stalls |
|
|
279 | (2) |
|
Conditional Branching And Pipeline Abort |
|
|
281 | (1) |
|
|
|
281 | (1) |
|
Hardware Threads And Context Switching |
|
|
282 | (2) |
|
Microengine Instruction Store |
|
|
284 | (1) |
|
Microengine Hardware Registers |
|
|
285 | (1) |
|
General-Purpose Registers |
|
|
285 | (2) |
|
|
|
287 | (1) |
|
Next Neighbor Registers And Software Pipeline |
|
|
287 | (1) |
|
|
|
288 | (1) |
|
Content Addressable Memory (CAM) |
|
|
289 | (1) |
|
Local Control And Status Registers (CSRs) |
|
|
290 | (1) |
|
Inter-Processor Communication |
|
|
290 | (1) |
|
|
|
291 | (1) |
|
|
|
292 | (1) |
|
|
|
293 | (1) |
|
|
|
293 | (2) |
|
Configuration, Control, And Status Registers |
|
|
295 | (1) |
|
Media Switch Fabric Interface |
|
|
296 | (1) |
|
Transmit And Receive BUFs |
|
|
296 | (1) |
|
|
|
297 | (1) |
|
|
|
298 | (3) |
|
Reference System And Software Development Kit (SDK) |
|
|
301 | (10) |
|
|
|
301 | (1) |
|
|
|
301 | (1) |
|
The Intel Reference System |
|
|
302 | (2) |
|
Operating System Used On The XScale |
|
|
304 | (1) |
|
External Host Operating System And Workbench |
|
|
304 | (1) |
|
External File Access And Storage |
|
|
305 | (1) |
|
Bootstrapping The Reference Hardware |
|
|
306 | (1) |
|
|
|
306 | (1) |
|
|
|
306 | (1) |
|
Alternative Cross-Development Software |
|
|
307 | (1) |
|
|
|
307 | (4) |
|
|
|
311 | (14) |
|
|
|
311 | (1) |
|
Support Software And Overall Structure |
|
|
311 | (1) |
|
|
|
312 | (1) |
|
Microblocks, Interconnections, And Pipeline Organization |
|
|
312 | (1) |
|
Assignment Of Microblocks To Microengines |
|
|
313 | (1) |
|
|
|
313 | (1) |
|
Ingress (RX) And Egress (TX) Microblocks |
|
|
314 | (1) |
|
Microblocks And Parallel Execution |
|
|
314 | (1) |
|
|
|
315 | (1) |
|
Buffer Queues And Buffer Allocation |
|
|
316 | (2) |
|
Buffer Handles And Packet Discard |
|
|
318 | (1) |
|
Packet Forwarding And Memory Rings |
|
|
319 | (1) |
|
Queue Array Hardware And Spilling |
|
|
320 | (1) |
|
Exceptions, Core Components, And XScale Processing |
|
|
321 | (1) |
|
|
|
321 | (4) |
|
|
|
325 | (14) |
|
|
|
325 | (1) |
|
|
|
325 | (1) |
|
Conceptual Organization Of XScale Software |
|
|
326 | (1) |
|
Core Component Infrastructure (CCI) |
|
|
326 | (1) |
|
|
|
327 | (1) |
|
Operating System Specific Library (OSSL) |
|
|
328 | (1) |
|
Hardware Abstraction Layer (HAL) |
|
|
328 | (1) |
|
|
|
328 | (2) |
|
Allocation Of Local Memory |
|
|
330 | (1) |
|
|
|
330 | (1) |
|
|
|
331 | (1) |
|
Buffer Management Facilities |
|
|
332 | (1) |
|
Organization Of Core Software |
|
|
332 | (2) |
|
Patching Symbols And Loading Microcode |
|
|
334 | (2) |
|
|
|
336 | (3) |
|
Microengine Programming I |
|
|
339 | (20) |
|
|
|
339 | (1) |
|
Intel's Microengine Assembler |
|
|
339 | (1) |
|
Microengine Assembly Language Syntax |
|
|
340 | (1) |
|
|
|
341 | (3) |
|
Symbolic Register Names And Allocation |
|
|
344 | (1) |
|
Register Types And Syntax |
|
|
345 | (1) |
|
Local Register Scope, Nesting, And Shadowing |
|
|
346 | (1) |
|
Register Assignments And Conflicts |
|
|
347 | (1) |
|
|
|
348 | (1) |
|
|
|
348 | (2) |
|
Repeated Generation Of A Code Segment |
|
|
350 | (1) |
|
Structured Programming Directives |
|
|
351 | (2) |
|
Instructions That Can Cause A Context Switch |
|
|
353 | (1) |
|
|
|
354 | (1) |
|
|
|
355 | (1) |
|
|
|
356 | (3) |
|
Microengine Programming II |
|
|
359 | (16) |
|
|
|
359 | (1) |
|
Specialized Memory Operations |
|
|
359 | (1) |
|
Ring And Queue Manipulation |
|
|
360 | (1) |
|
Processor Coordination Via Bit Testing |
|
|
360 | (1) |
|
|
|
361 | (1) |
|
Critical Sections And Folding |
|
|
362 | (2) |
|
Control And Status Registers |
|
|
364 | (1) |
|
Intel Dispatch Loop Macros |
|
|
365 | (1) |
|
Traffic Management And Packet Scheduling |
|
|
366 | (1) |
|
Accessing Fields In A Packet Header |
|
|
366 | (2) |
|
Dispatch Loop And Associated Variables |
|
|
368 | (1) |
|
|
|
369 | (1) |
|
Packet I/O And The Concept Of Mpackets |
|
|
369 | (1) |
|
Ingress And Egress Packet Transfer |
|
|
370 | (1) |
|
|
|
371 | (1) |
|
|
|
372 | (3) |
|
|
|
375 | (77) |
|
|
|
375 | (1) |
|
An Example Implementation Of NAT |
|
|
375 | (2) |
|
NAT Complexity And Simplifying Assumptions |
|
|
377 | (1) |
|
Network Address And Port Translation |
|
|
377 | (1) |
|
Ping Packets And Identifiers |
|
|
378 | (1) |
|
Dynamic NAT Table Creation And Management |
|
|
378 | (1) |
|
|
|
379 | (2) |
|
|
|
381 | (1) |
|
Implementation Of The NAT Microblock |
|
|
382 | (2) |
|
Header Caching And Alignment |
|
|
384 | (1) |
|
|
|
384 | (3) |
|
Header Fields That NAT Changes |
|
|
387 | (1) |
|
Definition Of Constants For The Entire System |
|
|
388 | (2) |
|
Constants And Types For The User Interface |
|
|
390 | (1) |
|
Definitions Of Scratch Ring Constants |
|
|
391 | (1) |
|
Overall Organization Of The NAT Microblock |
|
|
392 | (8) |
|
Macros Used To Implement NAT |
|
|
400 | (10) |
|
Optimized ARP Table Lookup |
|
|
410 | (1) |
|
Header Modification And Checksum Computation |
|
|
410 | (1) |
|
|
|
411 | (1) |
|
Core Component Initialization |
|
|
411 | (2) |
|
Core Component Packet Processing |
|
|
413 | (1) |
|
|
|
414 | (1) |
|
Structure Of The Core Component Kernel Module |
|
|
415 | (30) |
|
User Interface Application Code |
|
|
445 | (4) |
|
|
|
449 | (3) |
| Appendix 1 Glossary Of Terms And Abbreviations |
|
452 | (47) |
| Bibliography |
|
499 | (4) |
| Index |
|
503 | |