Ian Wakeman - Alan Jeffrey - Rory Graves - Tim Owen
Most networks currently have a topology where `smart' hosts sit at the edges of the network, and are connected by `dumb' routers.
Figure 1: Passive Network Elements
In reality, these routers are not so dumb: they have the capability to provide as much computational power as the hosts, but this computing power is mostly used to push packets around. The router managers decide what software should run on the nodes, so adding new features to the network requires the agreement of all the router manufacturers and managers.
An alternative model is to allow users access to the computing power in the routers, and to run their own software. The network has some dumb routers and some smart routers, and we can run a virtual network to hide the dumb routers from the user.
The creation of virtual networks has allowed the introduction of new services between islands in the network, whilst not requiring the intermediate routers to be changed. Examples include:
Most services do not require intelligence within the network,
especially if they are based on single sender and receiver pairs.
Providing high bandwidth and engineering routers so that packet
transmission is of low latency will generally prove the most effective
solution for most distributed systems problems. However, networks
will remain heterogeneous in the bandwidth available, for three major
reasons: because of physics, such as in the radio spectrum; because of
the expense of replacing legacy systems in poorer organisations or
countries; and because of congestion. When there is an ``impedance
mismatch''
between the
bandwidths of regions of a network, placing some intelligence at
the border of the regions can help in improving efficiency, as in the
use of proxies for mobile users.
When the communication involves multiple peers, then strategic placement of intelligence within the graph formed by the peers can reduce communication costs. When information needs to be synchronised in some way across a group, the latency of the synchronisation can be much reduced if the synchronisation happens at the centre of the group rather than going to each node at the edge. In particular, we see this as an excellent way to improve the communication latency of shared virtual reality spaces.
For these reasons, networks will become increasingly programmable, at
least at the borders of networks suffering from impedance mismatch.
If this happens in an ad hoc manner, without thought applied to the
inter-dependence between network nodes and connectivity, then our
networks may become more fragile, falling over whenever a badly
written program is pushed into the network. This is the motivation
for much of the current research into Active Networks (ANs). If we
can produce a model that allows routers within the network to execute
code on behalf of some granularity of packet
which
is properly matched to the peculiarities of packet-switched networks,
then we will hopefully improve the integrity of the network.
In this paper, we describe our approach to designing active networks. By combining recent work on the semantics of computation with a pragmatic description of the processes of packet communication, we have produced a programming model that both protects the connectivity of the network and encourages programmers to think about AN programming in the ``right'' way. By using a strongly typed programming environment which embodies policies about safety within the type system, we can statically prove programs are safe, and improve efficiency by removing many run-time checks.
In the next section we describe the interdisciplinary nature of our design process, showing how we have approached the design of our programming language from the perspectives of formal computation and of systems engineering of packet communications, defining safety and security policies to shape our design decisions. We derive safety and security requirements from our model. In the succeeding sections we explain the contribution of formal semantics in instantiating our safety and security policies, and meeting the derived requirements. We give an example of how the programming model can be used to produce a multicast algorithm, and we conclude with a comparison of our work to other AN activities.
This project has been a classic example of interdisciplinary work, involving the authors learning enough of each other's fields to be able to make sensible design decisions on the programming language. In multidisciplinary work, each of the experts in the team makes their decisions based upon requirements filtered into their own language. Interdisciplinary work requires the designers to learn to evaluate requirements within the discipline in which they are captured [1, 2]. Although much more demanding upon the designers, the design space becomes much richer since choices are available from any of the fields, and the ability to switch and communicate from any perspective reduces the likelihood that design problems will emerge later down the development line.
The design of the language has used three major viewpoints to come up with a language for the safe programming of Active Networks: A communications view; a programming view; and a safety view, as in Figure 5. By discussing and exploring from the three viewpoints, we have attempted to evolve a model for active networks that enables the novel applications people are hoping ANs to provide, yet protects the common resources of the network from both malicious attacks and buggy programs. In particular, we have asked ourselves the following questions:
A network enables entities to connect and send information between each other. Thus the most important feature of any network is the ease with which entities can connect with each other. The Internet has exhibited one of the fastest rates of connection of any telecommunications network. What is it about the Internet and in particular the Internet Protocol that encourages this fast growth?
The packet forwarding process in the Internet can be modelled as shown in Figure 6. Packets arrive from a link onto a router, information is retrieved from tables about where a packet should be routed to based on the destination address, then the packet is placed on the appropriate output link. Currently in the Internet, the tables are maintained either by distributed routing protocols such as OSPF or BGP, or statically by the router manager.
Figure 6: Basic router packet forwarding engine
More importantly, the various router administrations have clean and appropriate interfaces to export and import routing information. These interfaces are reflected in the management processes between the router administrations, which only have to ensure that links are working and sufficient routing information is exchanged to keep the network working. Adding a new machine is a local matter of assigning an address, which can then reach out globally. Since the return address is carried within the packet, the connection can be two way immediately. As long as connectivity is maintained, TCP/IP will ensure reliable delivery of data. In order to keep a network working, any new services should not disrupt the connectivity of the network, and should not overload the router administrators by requiring that they hand install new programs.
The key component in maintaining connectivity of any packet switched network is thus the routing infra-structure. After many years of struggle, the Internet has evolved ``correct'' routing protocols, and enough intra- and inter-organisational process so that the routing generally works (see Paxson in [3] for a discussion of just how well it works). The routing protocols distribute local knowledge about which links are available to a sufficient area of the network so that connectivity is maintained. The routing is also dynamic, in that changes in the availability of links may affect the routing tables of other nodes.
From the above, we have derived the following communication requirements in the programming model:
Since AN programs will probably only be effective at the impedance mismatch points wihtin the network, we believe that AN nodes will not exist at every router but at strategic points in the network, such as routinely congested links across oceans, firewalls or boundaries between radio and terrestial links. It is important that the communication model should not require ubiquitous AN nodes, but should allow an arbitrary number of routers to be transparently in between AN nodes.
Transparency is easily obtained by adding a generic AN bit to the packet header. A packet belonging to a flow operated upon by an AN program will set this AN bit within the header. A non-AN router can just ignore the bit and forward the packet normally. When an AN router sees the AN bit within a packet, the packet is routed to be processed by the Active Node processor. The runtime system on the Active Node will select the correct program to be run over the packet, and allow access to the shared heap across the programs, as in Figure 7. We assume that we have a homogeneous AN architecture. By separating the active node from the main forwarding path, the active node can be on a separate card or even on a separate machine to the router.
Figure 7: Active router packet forwarding engine
This requirement forces us to use a virtual network model of communication between AN nodes, where the AN nodes are overlaid onto the actual communications network.
As is well known, the Internet Protocol is a best effort protocol, allowing packets to be dropped duplicated or misordered. ANs should account for this in the design of the programming langauge, so we have the following requirements:
Nodes in the network are also subject to arbitrary reboots, or can disappear from the network and then reappear. State installed in the nodes can thus disappear and then reappear. This is known in the literature as soft state, which is re-installed as required. There are thus the following requirements on the state installed in nodes.
To make judgements about the state of the network, so as to decide upon possible courses of action, the network must provide estimates for the bandwidth, loss and delay likely to be incurred upon any next hop.
Networks are shared communication resources, used by all to commincate with each other. To ensure the integrity of the network, we must ensure that each of the components of the network continue to work. The problem is that the resources of the various component parts of the network (bandwidth, memory and processor time) are shared by all. If we allow arbitrary code to execute on behalf of some user, we must ensure that all the resources required by other users of the network are protected from abuse by misbehaving or malicious programs.
The bandwidth of the links is the key resource of a network. Any AN program injected into the network must not be able to attempt a denial of service attack by generating unbounded numbers of packets by multiplying packet generating entities within the network.
Network queues should be protected from short term congestion caused by large packet bursts being injected into the network.
Even though we posit that the active node will be separate from the main router, the active node processor is shared amongst all AN programs. Therefore, we must ensure that no single program is able to consume all of the processing time on the active node. Analagous to the fast path processing of a normal router, we believe there should be fast path processing for packet forwarding code and a different form of processing for control programs.
In particular, when a program is using packet forwarding code, the processing required for the packet should be completed in one gobbet, without interruption. We'd like this to happen so that we can minimise the overhead of context switching and take advantage of natural dispatch points such as packet sending. For this to happen, we make the following requirement on the code:
In the control portion of a program, the computation can be arbitrary, and thus we cannot statically determine the path of computation. Instead we need to provide some means of pre-empting threads from the processor and ensuring that the scheduling of threads by the run-time is fair.
Often a control thread will be awakened periodically by a timer. Each timer uses memory, and loads the processor. In addition, if the granularity of the timer is small, then the frequency of the timers could generate too many interrupts on the processor. If a thread can arbitrarily set timers, then it can deny processor time to other threads.
Within the control section of a program, a program can spawn threaads to run. Each thread uses the equivalent of a thread control block, taking up memory.
The memory usage of a program must be controlled. If a program allocates all of the shared heap to itself, then it denies access to the resource to other programs.
To protect programs from each other, programs cannot be allowed to arbitrarily generate memory accesses.
For packet forwarding code, we would prefer that there was no time spent allocating heap, nor any time spent running the garbage collector. If we don't allow heap to be allocated during execution, then there will be no garbage to collect.
As described above in Section 3, the routing table is the most sensitive data structure of a network. We believe that direct access to the routing table must be forbidden to protect the network.
The requirements listed above are intended to prevent misbehaving programs from trashing the network and in preventing obvious denial of service attacks from malicious programs. However, the difficulty in security is in anticipating how the malicious agents will attack the system. Our security model is based around a set of trusted nodes, and trusted code, which is protected using cryptographic techniques. Principals are trusted to either provide code or recommend good coded.
Implementing safety and security policies in current computing environments requires run-time checks. Typical examples include comparing of virtual memory addresses against bounds in the segment and page tables to ensure that memory accesses are not out of bounds and checking user identifiers against access control lists to ensure that a particular call can be made.
These run-time checks add to the overhead in executing a program, and for functions with strict performance constraints, the resulting delays may be unacceptable. For AN technologies in particular, the need to compute, manipulate and switch packets in real time means we need to minimize the number of run-time checks whilst still obeying the strict safety requirements.
In our design we have used recent (and not so recent) advances in the semantics of programming languages to define an AN language which replaces dynamic run-time checks with static checks which can be carried out when the active code is downloaded. Static checks are usually more complex then dynamic checks, but they only have to be carried out once, during the initialization phase, so the packet-handling phase can execute more efficiently. We have taken the safety policies required of an AN language and built the type system to support these policies. If we can prove that the type system embodies the safety policies, and that the language satisfies subject reduction, we need only check that the program is well-typed before running the program.
A common approach taken in the formal definition of concurrent or
distributed programming languages (originating with Milner's
CCS [4] and Hoare's
CSP [5]) is to define a grammar
representing a network. For example a network with two threads
and
running at location
, and a thread
running at
location
is written:
The threads are described using a grammar based on a core programming language, for example a program which spawns two sub-threads is:![]()
We can then formally define an reduction relation on networks defining how execution of threads may take place, for example:![]()
We can also formally define a type system for programs, and hence for networks, with judgements![]()
One common feature of type-safe programming languages is that they provide an abstraction barrier between the programmer and the run-time environment. For example, direct pointer arithmetic, or arbitrary casting are not allowed.
We believe that the correct model for programming ANs is to view the program as a form of virtual network creation and maintenance. The virtual network is made up of a number of nodes at the edge and within the network. In the packet forwarding part of the program, the program at each node runs packet forwarding code to send data to the necessary places. If some exceptional event occurs, such as a new node joining or leaving the network, then the program will run code at each node to communicate and update state between each of the nodes.
A network of nodes communicating with each other is very close to the
use of communication channels in concurrent languages with dynamic
channel creation. Languages based upon the creation of unique channel
names such as PICT [8] or Concurrent ML [9] build
on the original work of Milner, Parrow and Walker's
-calculus
[10]. These unique channel names have been
generalised to locations in Ambient [11] and
distributed
-calculi [12, 13].
In our language, each node in the virtual network has a unique identifer. The unique identifiers can only be tested for equality, and are collected in a set forming a type of location. To allow for the construction of a graphs of nodes, we define a function route that when given a destination location, will return the next hop on the route to the destination, as the type location. We also have a subtype of location of here which is the location of the current node where the program is executing.
In the
-calculus, communication between threads is based on
channels with unique names. We use the same model of communication,
allowing the programmer to generate a unique name and then bind a new
resource to this unique name. These resources can be local
(corresponding to heap-allocated objects), global (corresponding to
registered objects) or classes. If a program is on the node on which
a resource is registered, then it can access the resource.
The sending of packets is modelled by spawning threads at neighbour nodes with associated bindings of the free variables within the thread code. In particular, if the name of a resource is unbound within the thread code, then it will be bound from a name within the sending program. In this way, we can communicate resources across the network so that they can be used from other nodes. The bindings of the free variables is the closure of the thread code, so that it is a self contained chunk of code to be run at the neighbouring node. After a thread is spawned at a remote node, the spawning node can only communicate by spawning a new thread back on the original node with new data in the thread closure. This returning thread can then access resources registered in the original program to return values.
The typing rules for the spawn operation force the programmer to calculate the next hop to the destination at that instant, by using the route function eg
In the case when the programmer does not care about which node is the neighbour, they can simply spawn forwarding code as shown below. This obviously satisfies Requirement 3.4.![]()
When the programmer wishes to keep track of the topology of the network so as to implement some form of application routing, then they can store the location values returned by the route function. But before these values can be reused, the programmer must test their validity before spawning threads at these supposed neighbours, since the route may have changed between storing the value and actually using it eg.
In particular, since the route function is the only way of generating neighbour values, the program has to interrogate the local node to discover where to send packets (Requirement 3.1 3.2). Since there is no requirement on the route to be consistent about the values returned, then the programmer is forced to recognise the non-stationary nature of routing topology (requirement 3.3) and maintain soft state about the actual topology of the virtual network, reinstalling the program state if the route changes (Requirement 3.8).![]()
A normal packet network is thus transformed from one in which just data is sent around into a virtual active network in which threads are sent around with the necessary resource names of places along their routes so that they can send threads in the reverse direction to communicate computed information. The spawning of threads at a neighbour is analogous to sending a packet. Since the state of threads cannot be discovered except by sending another thread after it, we model best effort packets (Requirements 3.5 and 3.6). If the program requires the thread to return some values, then the programmer must surround the thread with a timer to repeat the attempt. Since the new route for the second attempt at respawning may not be the same, the programmer must again follow the soft state model (Requirement 3.8), reinstalling resources if the virtual graph is reformed.
For example, a program which delivers a packet to a remote stream is:
Resource allocation is of primary importance to AN code: allocating heap is an expensive operation, and we would like to ensure that packet-forwarding can happen without the overhead of heap allocation.
To achieve this, we can distinguish between three categories of program:
This separation of categories of computation is commonly used in semantics [14, 15, 16] to distinguish between expressions with no computational significance such as:
and those with computational significance:![]()
This separation into categories of computation can be presented using monadic type systems where we distinguish between:![]()
Once we have distinguished between different categories of computation, we can make engineering decisions about which facilities should be available in which category. For example, we could allow while-loops in control code, but restrict to for-loops during packet-forwarding, to ensure that packet-forwarding code is guaranteed to terminate (Requirement 4.3).
Nominal calculi have not only modelled communication networks; they have also been used to model security processes [17, 18, 19]. We will be using the nonces created in nominal calculi to model the chains of trust formed from principals to their code, and in the use of secure certificates.
Recent work on typed languages [20, 21] has shown that more subtle safety properties
can be verified by a type system. For example, we wish to ban a
denial-of-service attack where a packet is bounced backwards and
forwards between two nodes forever. The usual solution is to add a
hop-count to each packet, so we add a new parameter to the
command, which is a counter to decrement whenever
a new thread is created, for example:
The semantics of![]()
If the counter ever drops to zero, then the packets are dropped.![]()
This method works for trusted routing code, but an unscrupulous agent can flood the network with packets by always setting the hop-count to one:
This creates an infinite loop between the two sites![]()
This denial-of-service attack is possible because we have allowed
active code to treat integers as hop-counts. One possible solution is
to introduce a new type for hop-counters, which is a
subtype of
, so
s can be cast to
s, but not vice versa. If we use this new type rather
than
to represent hop-counts then we can still forward
packets,
but the denial-of-service attack fails to type-check since
is not a subtype of
.
This simple approach stops unbounded use of the network, but still allows exponential blow-up, for example:
Although this process will eventually terminate, it does so after using resources exponential in the size of c.![]()
Fortunately, there is a standard type-theoretic solution to such problems, which is to add a restriction that the count type is linear, that is can only be used once. Linear types have a logical basis in linear logic [20], and form the basis of linear lambda-calculi [21, 22].
We can use linear counts as the basis for determining how much of the processor a thread can use on a given node, how many threads can be spawned, how many packets sent, how much heap or resources can be allocated and how many timers can be set. Since linear types can only be used once, they also offer the possibility of detecting replicated threads. Yet another use of linear types is in setting the maximum burst a packet can send into the network (Requirements 4.4, 4.5, 4.6, 4.7, 3.7, 4.1, 4.2).
Although the core language will include the linear counts as parameters to the resourced operations, a programmer will not normally need to do anything sophisticated with the counts. Therefore we will hide the details of the counts beneath abstractions that use the counts in appropriate ways and remove the burden of managing the counts from the programmer. We have used these abstractions in the examples and in Section 6.
The runtime system is responsible for accepting incoming threads and running the appropriate code. It will act as the interface between the route function in the code and the routing table, and provide access to other network information, such as the MIBs of various network components, or estimates of network state (Requirement 3.9). Full details of the runtime system will be detailed in a following report.
The example shown is based loosely upon the multicast routing algorithm in sparse mode PIM [23]. The factory class runs at a well-known node, which is equivalent to the rendezvous point. To create a new group, the build method of the factory class is called. This generates a new class with a new name, equivalent to the multicast group address and some local state, such as the rendezvous location.
To join and be able to send and receive to and from the group, a node has to load the generated group class. Discovering the name of the group class and some place to load it from is equivalent to discovering a multicast address from a tool such as a session directory.
When the group class is loaded, it spawns a thread at the neighbour en route to the rendezvous node that will subscribe the current location. This is the analogue of the join packet going upstream. The subscribeLocation method invoked checks to see if the routing is symmetric, and if so, subscribes the downstream host. If the routing is asymmetric, it adds the hop back to the list of subscribed downstream neighbours and launches a thread back to the original location to subscribe on the way down. Since a method in the class is invoked on the way downstream, the class will be loaded, and the upstream node subscribed automatically.
To receive packets, an entity hands a packet handling Stream to the subscribe method of the group class. To send a packet, the entity calls handle. This then invokes the sendDownstream and sendUpstream methods to spawn threads to send the packets up to the rendezvous and then back downstream.
Ther are various points to note. We can only spawn at
neighbours. A real implementation would need to add an
method, allow nodes to leave the tree, and would
periodically check to make sure its upstream neighbour was still alive.
The programming language is designed to match the communications model of networks, and is generally hop by hop. However, the implementation of the runtime system is not constrained to follow the language. In particular, we will implement library routines defined in the Safetynet language for common programming needs such as rpc or the registering of remote resources using more appropriate protocols which go straight to the destination, rather than enforcing the hop by hop nature of the semantics. We will be investigating the use of static analysis in the compiler to optimise the communication and processing costs.
As far as we are aware, there is only one other project which is attempting to bring together modern programming technologies and Active Networks. The Switchware project [24] at Penn is producing a two level architecture based on strongly typed languages. The PLAN language [25] is based upon a subset of ML with the addition of primitives to support remote execution. The language provides only primitive recursion and so guarantees termination, but doesn't allow the accumulation of state except through use of special system primitives. We believe that the type systems defined herein can protect against unsafe heap usage.
Netscript from Columbia is a dataflow language to glue together data streams into the appropriate filter primitives [26]. It is an object oriented scripting language and so would be more akin to TCL than to a strongly typed language.
MIT [27] are basing their work upon Java with its attendant level of safety.
Bhattacharjee et al. at GSU [28] are providing hooks into router supplied and controlled functions, rather than providing full programming environment.
BBN has defined two programming languages, ``Sprocket'' a high-level language rooted in C that compiles to ``Spanner'' an intermediate language that assembles into a small footprint, so that the programs can sit inside a single packet [29]. Spanner is a CISC-style stack based assembly language with high level types for the operands. Both Sprocket and Spanner perform type checking at compile and assembly time, respectively. Additional safety relies on cryptographic authentication of the incoming packet, run-time authorization of the program, and a ``sandbox'' virtual machine environment which checks the types of arguments before instruction execution and limits the program's access to node resources.
Other work outside of the AN scene of relevance includes Spin [30] which is an extensible operating system which allows users to download code into the kernel. An operating system faces many challenges common to those facing ANs. The language chosen by the spin designers is Modula-3 which is a strongly typed language with its automatic garbage collection, and known denotational semantics for a Modula-3 subset [31]. The type checking of the Modula-3 compiler is used to ensure domains of protection at the level of binding names for access to objects. One of the seminal works in the field of ANs was the thesis of Christian Tschudin [32], in which he used a form of Pascal as the basis for developing the active packets.
The language is being specified, and a prototype compiler is currently being built for the first phase of the language. The first pass of the compiler will compile down to Java classes, so that we can bootstrap upon the existing AN Java infra-structure [27], and use our AN simulator [33]. The compiler will be finished by September 1998.
Prototype applications are planned to appear by Autumn 1998, such as self organising web caches. We plan to extend the work by designing a well-typed, formally defined virtual machine, rather than using Java bytecode.
We have described the process of designing a language for Active Networks. This has been an inter-disciplinary work, basing the semnatics of the programming language upon the communication model of the Internet and using results from the programmming languages community to reduce the overhead of run-time checks.Whilst we would not claim that proection can be achieved solely through the use of language level techniques, they can go a long way to removing the costly runb-time checks that bedevil a language like Java, such as array bounds checking, checking the class details for any method invocation, tracing the type hierarchies to ensure assignment compatibility and so forth.
Designing a Programming Language for Active Networks
This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -t Design Considerations for Safetynet -up_url ../index.html -up_title Safetynet Project -split 0 -auto_navigation -dir /rsunx/home/ianw/docs/safetynet/www/an-design isdn.
The translation was initiated by Ian Wakeman on Tue Jun 30 17:25:46 BST 1998