Effective Distribution


Effective Distribution
in a Loosely Coupled Network


Karl Perkins
Department of Computer Studies
Loughborough University

1. Introduction

Computers have become faster every year, because of hardware designers' ingenuity rather than from any change in the laws of physics. Why, seemingly, is everyone in the world trying to connect their computer to everyone else's? What benefit can there be?

The answer is information. In isolation a computer can only process the information entered from a user, publish the results back to the same user and use the resources embedded within the machine. Connect the computer to a network (the bigger, the better) and the possibilities for exchange are greatly improved, as are the possibilities for things to go wrong.

We could define effective distribution as attempting to reduce the probability of failure, regardless of diminishing odds to the contrary. In the remainder of this chapter is presented a brief introduction to some of the history, achievements and requirements of distributed systems.

1.1 The Internet and WWW

These provide an example that has recently received intense academic, business and public interest. Distributed computing is no longer a trend just in academia or the business world, but has percolated into society itself. The future shape of global scale distribution will probably be the Internet and the World Wide Web and both have built enough momentum to be key technologies for the next millennium.

The Internet currently consists of millions of users, computers and almost just as many networks all over the world. In 1980 there were approximately 100 computers on the Internet, but since then there has been no major change to the underlying communications software or applications. Users, computers and networks come and go without any prior warning, making the Internet the ultimate loosely coupled network.

The Web, though not perfect, has matured from a simple past. It is based upon browsers and a communication protocol called HTTP (Hyper-Text Transfer Protocol) which communicate across the Internet and its protocols. The browsers and HTTP evolved from earlier browsers and protocols, some of which still exist today. Primarily the Web's aim was to distribute information in a people-friendly way. This has received the most attention and is constantly developing.

There is still the requirement to distribute the location of information contained on the Web. This has been tackled by indexing information at known sites (typically referred to as a Search Engine sites). The time to find required information is perhaps the Web's most obvious failing. This is due to the scale of the search task, the network itself and in the future, as the Web becomes larger, it's going to get worse.

The inadequacies of the Web have not stopped future speculation into other uses, apart from information distribution. Notably software distribution systems have gained credibility, for example Sun Microsystem's Java (started in 1991) and Microsoft's Blackbird[1]. Both are platform independent and dynamically execute program code from the network by transparently porting it from a remote server. Java achieves this as an extension of the existing HTTP protocol and the technology has been adopted into Netscape and Spry browsers. Presently there are few real applications, but it's not going to remain this way.

The Web currently offers staticly distributed information and Java initially promises a dynamic and interactive Web. Java could aid distributed application development, since Java code can act as client and server. An article published in Byte[2] foretold of the Web-PC; a mid-level PC with a network connection and distributed applications possibly written in Java. If this becomes the ultimate fate for Java and as the Web becomes larger, then the search time for information and program code will become larger - unless distribution techniques are improved.

Machines and networks will become faster, increasingly reliable and attract further investment. Why try to make distribution techniques effective when any benefits will seemingly be overtaken by improvements in infrastructure? The reason is that the Web can not physically become faster once the networks and machines can fulfil the total demanded bandwidth.

An example which shows this is two machines connected by a large network shown in Figure 1.1. One machine requests information from another and when it receives an answer it sends another request. Even assuming infinite processing speeds on the machines and infinite bandwidth of the network, then there is a maximum rate at which these transactions can take place. This is defined primarily by the distance of network between them.

Figure 1.1 The latency problem

Messages can traverse this distance no faster than the speed of light (1/10 this for an electrical signal in copper) and is revealed as an apparent network latency. It is the same latency that can be heard in an international long distance telephone call. At the microscopic level, processor designers have wrestled with the same macroscopic problem. One solution is to place a cache machine between originator and receiver, but [Partridge92] shows that cache consistency becomes a problem.

Speeding up the infrastructure can have limited success on a global network such as the Internet. Investigation of the protocols and systems that form the distribution methodology itself could be the way forward. This is true of any distributed system - not just the Web and the Internet.

1.2 Distributed System Properties

In the following section are seven commonly cited properties (shown in italics) for good distributed systems. It is often difficult to separate properties from design issues, but any design issue should fit into one of these properties. For example, the Java environment described in the previous section is portable due to its platform independence. Portability is a design issue that is part of migration transparency. As a property, transparency encompasses more than portability.

Distribution requires separate systems to work together using communication across a network. The network, processing and resources should be shared and often these requirements are interrelated.

Since the network is usually scarce and in demand then sharing it effectively requires reducing the number of network transits to establish a connection, to communicate and to return an answer. There could be some loading of the network caused by load-balancing schemes, which should be minimal, but this will aid sharing of machines for distribution requests.

For a system to be used it must be reliable, and for that to be true it must be stable. Stability in a closed system can be guaranteed, but in a distributed system it is at best tolerant of failure. This is associated with orphaned and terminated distribution requests and how to suffer them until such a time as they can be identified. Then there is the need to recover and continue, without corruption.

Access to resources on computers, or information during transit across a network requires security. This often involves access rights and encryption, but these come with a cost. Access rights can be hard to distribute, monitor and revoke. Encryption can be weak or strong, but exactly how much is needed is difficult to determine.

Concurrency exists within distributed systems and is similar to parallel processing. This can be used to obtain an overall gain in throughput, but has to be managed carefully if shared resources are to be synchronised.

To allow any system to be easily and dynamically extended or redefined it must be open. It is not just desirable, but is a requirement for a loosely coupled distributed system. New services can then be provided that were not originally thought of, without redesigning the whole system to cope.

For users and programmers of a distributed system to view it as one whole system it must be transparent. Transparency can be broken into many different types[3], but the main two are access and location transparency. The required level of transparency is often difficult to quantify and depends upon the situation at hand. Printing is often cited as a situation where location transparency is not desirable. If several printers exist within a building and one stopped accepting printing requests then perhaps a little location transparency would be useful. When transparency is applied to all the other properties in a particular system, so that they become invisible to user and programmer, this gives the single-system image or single virtual processor [Tanenbaum95] property.

Scale is probably the most dominant feature of distributed system design. It affects every design decision for the previous properties. The aim is for the system to degrade at least linearly, as the number of machines increases.

A good example that does not scale, is the rwho service provided under HP-UNIX which was intended to provide a transparent method of discovering who was logged in over a number of machines. Demons on each machine would query all the other machines they knew of, to find out who was currently logged on. As the number of machines increases, the amount of network traffic containing log on information increases exponentially, until the network is jammed. Often rwho is disabled or restricted on HP-UNIX machines.

1.3 Summary

The whole field of distributed systems is wide and still has many questions unanswered, probably even more as yet unasked. Twenty years have passed since the birth of distributed systems and still there is much to do and perhaps most in connection with networks on an international scale. The requirements in the previous section give a guide, but not the solutions. Problems within transparency and concurrency have solutions, but are expensive to implement. Some have no solutions at all.

The Web has opened the Internet and swelled the number of people who communicate using it. This loosely coupled network poses many problems in itself: it possesses high latency and low bandwidth, is unstable and insecure and perhaps most significantly, is huge. If such a network is to develop and support further facilities, then a deeper understanding of the all aspects of effective distribution will be needed.

2. Current Distribution Systems
That exist today

Despite implementation difficulties, there are many systems that exist to provide distribution of information and processing. Some are complete integrated operating systems, others just protocols. Many are not pure, in the sense that they do not apply all the properties outlined in Chapter 1, but this is often because of the particular application the system was designed for. All can show the pitfalls of distributed systems.

The majority have only been applied to local area networks, but since there is nothing like them for a large scale network then they are the only material that can be studied. TCP, UDP, IP (which are the Internet protocols) and some protocols that sit upon them (Email, network news), are the only successful systems that have been applied on a large scale. The Internet does not currently provide an integrated system. It does provide a collection of distinct services that have evolved over time.

2.1 Introduction

In the first half of this chapter are descriptions of a number of distributed operating systems which vary from precise UNIX emulators to novel systems in their own right. These are described fully in Distributed Systems Concepts and Designs [Coulouris94] and it is from this book that I have drawn some key points about each.

The second half of this chapter examines existing protocols that allow a local machine to invoke remote processing or retrieve remote information.

2.2 Mach & Chorus

Mach [Acetta86][Boykin93][Loepere91] dates from 1986 and is the successor to two previous projects, RIG [Rashid86] and Accent [Rashid81][Fitzgerald86]. Rig was developed in the 1970s and Accent in the 1980s. Chorus was developed from 1979 at INRIA in France and later by a company called Chorus Systèmes [Rozier88,90].

Mach and Chorus are distributed operating systems which are designed to support or emulate other non-distributed software and allow enhanced distributed software to be produced. It is interesting to note that adding concurrency to non-concurrent software is often seen as a holy-grail in computing. Mach and Chorus do not do this, they only distribute software but this is often confused for adding concurrency to software. Both can emulate databases, language run-time systems and other system software but they are geared towards the emulation of conventional operating systems.

The structure of the emulated operating system is facilitated by servers which communicate to a micro-kernel which is common to all implementations. The definition of a micro-kernel has been slightly blurred and has no fixed definition. The notion is of a operating system kernel that provides the bare amount of function so that the rest of the operating system can be implemented as processes. This gives openness to the operating system and allows for future scale requirements.

Other parts of the emulated operating system which need to avoid context switches from the user to kernel state (for speed reasons) may be implemented within the kernel. Mach leaves device drivers within the kernel, other micro-kernel operating systems do not. Mach and Chorus both provide similar processing structures (threads, processes) and can take advantage of a multiprocessor. They have complex calling structures and communication so that they can emulate a variety of operating systems.

Notably, Mach and Chorus are intended for binary emulation of UNIX. This requires that binaries compiled on a normal version of UNIX on the same architecture should run unmodified on the emulation and be transportable to remote machines. This means that each machine runs the emulation software and creates a single process table and so also creates a single virtual UNIX machine. This satisfies many of the transparency requirements and gives a certain amount of migration capability to processes.

Both Mach and Chorus use the concepts of ports and capabilities, but they use one to indirect to the other. Resources in Chorus are protected by capabilities ( Figure 2.1) and in Mach by ports ( Figure 2.2). Capabilities have protection identifiers that a user level service will use to authorise an actor (similar to a task) that sends a message to its port. Mach's ports have rights (capabilities) to send or receive built in and are protected by the kernel.

Figure 2.1 Chorus capabilities

The Chorus capability contains a unique identifier (normally the identity of a port) which is fabricated with the type of resource, the identity of the computer that created it and a local time stamp. The key allows an individual resource to be identified among many accessed by the same port. It also provides a level of protection since it is chosen to be hard to guess.

A Chorus capability on its own is not enough to provide UNIX style protection, so each actor has a protection identifier associated with it. An actor that receives a capability can request the kernel to find the protection identifier of the actor that sent it.

Figure 2.2 Mach ports

Mach tasks refer to port rights by identifiers that are only valid for that task's port name space. This makes them fast for access to resources but also requires processing to give away to another task. Mach has one user-level network server per computer that is responsible for the transfer of port rights, apart from other network communication tasks. Figure 2.3 shows how a message is transmitted between tasks located on different computers.

Figure 2.3 Message transmission in Mach.

The aim of ports and capabilities is to provide security and transparency to resources and this they do reasonably well. What they do not provide is a convenient method of managing the access rights and information once they are given away. What if a access permission needs to be revoked in the future or once a piece of information has legitimately accessed by a client, who decides whom the client may distribute it to? Capabilities and ports do not provide all the security required in a distributed system, they are just a beginning. They do exist in many other systems and have developed differently.

2.3 Amoeba

Development of Amoeba began in 1981 at Vrije University in Amsterdam [Tanenbaum90,92]. It is a full distributed operating system and uses the processor pool architecture. It assumes that processors will have enough memory to run applications without backing store. Users will obtain their processing from the processor pool, rather than from their own multiprocessor.

Three requirements were made in the development:-

Network transparency

There is no location specific accesses to resources - particularly the filling system which is system wide and seamless.

Object-based resource management

Every resource is an object and accessed by a uniform naming scheme. Even when a resource is local to a system, it is accessed via a server.

User level servers

Almost all system software runs as servers on a micro-kernel.

Figure 2.4 An Amoeba capability

Protection of resources is provided by capabilities ( Figure 2.4), which are 128 bits long. A client requests a new object and the server supplies a capability. The check field contains a random number, so any client trying to forge a capability would have to try combinations (nearly ) which is highly unlikely to succeed. If a client wishes to give away a reduced capability (reduced permissions to a resource) it does so by applying a one way function to the check field.

A client with the reduced capability can not increase its rights since it can not compute the old check field. This is because the function applied to the check field has no inverse. This allows better control for the distribution of a capabilities, so long as the receiver of a capability can be trusted.

There does not seem to be a clear winner between the Mach, Chorus or Amoeba schemes for accessing resources. None of them are completely secure, but it is important to realise that extra requirements can be constructed upon these protocols within the process managing the resource and the processes that access them.

2.4 Clouds

Clouds [Dasgupta91] was originally developed from 1986 at the Georgia Institute of Technology and since 1987 has been based upon a micro-kernel called Ra. It supports an object-thread paradigm where objects are large passive entities and threads are activities within them which invoke other objects. Clouds does not support message passing, rather threads move from object to object and also computer to computer.

Clouds has the following design goals:-

Support for the object-thread computational model

The objects within Clouds are protected, passive entities and threads are activities that are independent, but invoke objects. Within Cloud objects may exist implementations of smaller objects (this is called clustering). This can reduce the number of context switches when invocations are within the same Cloud object.

Network transparent object invocation

The only mechanism to communicate to an object is by invocation. Invocation requires the target object, the method and the parameters. All data transfer occurs via parameters, not by message passing. Objects can exist on any machine, so long as it is a Clouds server.

Persistent, single level storage

Any changes made to an object's data, are immediately reflected in secondary store. There is no need to request that data be saved. The objects appear persistent, even if they are not.

Sharing via objects

All sharing occurs via common Cloud objects. Although the code and data may exist on different computers, Clouds uses consistency protocols to ensure the programmer perceives that memory is logically shared between threads even if it is not physically.

Automatic load balancing

Load balancing may be exercised whenever an invocation is made. The programmer may decide alternative execution locations for invocations.

Figure 2.5 shows how a Cloud thread invokes objects and returns to the original object. Threads are synchronous, so if an object requires an invocation and still wishes to continue processing, then it must do this through a second thread. What makes this model different to any previous examples, is that the object does not control the number of threads executing within it. Threads do not have to be created explicitly by the object to support multiple invocations.

Figure 2.5 Thread invocation within Clouds

Clouds is a very novel approach and a few interesting points can be drawn from it.


* Since there are no messages, only invocation, then there is no need for a global process table. This simplification reduces the management overhead of such a table.


* Object-orientation allows a consistent interface to resources, a level of stability and protection. Since the properties and data are encapsulated with the methods inside an object then it would be impossible, for example, to open a window using the handle to a file.


* Because load balancing occurs at invocation, then there is no need to migrate objects to processors that are lightly loaded. In theory, a pool of processors should always be evenly loaded as more objects are invoked.

2.5 rsh

rsh is an existing protocol used to invoke remote processing and is not particularly transparent nor is it secure. Taken from the UNIX manual page, this is how it starts to describe rsh:-

DESCRIPTION
rsh connects to the specified hostname and executes the specified command. rsh copies its standard input to the remote command, the standard output of the remote command to its standard output, and the standard error of the remote command to its standard error. Interrupt, quit and terminate signals are propagated to the remote command; rsh normally terminates when the remote command does.

rsh[4] provides an interface from one host to another so that a UNIX command can be executed on a remote machine - as if it were being executed locally. rsh commands can be chained together so that commands may be indirect via a number of machines.

For one UNIX machine to call another, there must be some authority to so, or a password given. FTP uses a user name/password file, called .netrc, which is seen as bad for security if it is used for anything else apart from anonymous FTP. If the .netrc file can be read by anyone else apart from the owner, then the remote account is open to access and abuse. Also the password could be transmitted across a public network and snooped[5] by some foreign party.

rsh uses authority information in a file called .rhosts. Again, from the man pages:-

Each remote machine may have a file named /etc/hosts.equiv containing a list of trusted hostnames with which it shares usernames. Users with the same username on both the local and remote machine may rsh from the machines listed in the remote machine's /etc/hosts file. Individual users may set up a similar private equivalence list with the file .rhosts in their home directories. Each line in this file contains two names: a hostname and a username separated by a SPACE. The entry permits the user named username who is logged into hostname to use rsh to access the remote machine as the remote user. If the name of the local host is not found in the /etc/hosts.equiv file on the remote machine, and the local username and hostname are not found in the remote user's .rhosts file, then the access is denied. The hostnames listed in the /etc/hosts.equiv and .rhosts files must be the official hostnames listed in the hosts database; nicknames may not be used in either of these files.

So the .rhosts file contains trusted user names and hosts. If within the file on machine "Andrea" is:-

Bert John

This means that user "John" can connect to machine "Andrea" from machine "Bert" without supplying a password. Even if the .rhosts file could be read by another user, it should not be possible to use the information to gain access to a remote account. If the .rhosts file can be written to, then their is the possibility of access. Few UNIX users have files which are writeable by anyone else, at least not without their knowledge.

The .rhosts file works well, so long as users have a consistent user name across all the UNIX machines they wish to access. If this is not so, then the user has to put the remote system's user name into the command line. For example, if upon machine Andrea the user John was known as Smith, then John would have to indicate the change of user name to rsh.

Typically the .rhosts file will be used to provide symmetric access to machines. If machine A has authority to connect to B, then B will have authority to connect to A. This means, using the above example, that upon machine Andrea and machine Bert will be the following .rhosts file:-

Andrea Smith
Bert John

This means that if machine Andrea wishes to know the remote user name for Smith on machine Bert, it need only look through the .rhosts file to find the answer. It is possible to write a small code patch to rsh[6] that scans the .rhosts file. This does make rsh more transparent, but every machine you wish to connect to needs to have a constantly up-to-date version of the .rhosts file, which is a non-scalable task as the number of machines increase.

rsh cannot be used for UNIX commands which are interactive upon the command line. Its main use originally was to pipe data from command to another, across machines transparently. It has found other uses, specifically to invoke X window clients, which the next section examines.

2.6 X & xrsh

X is one of the most popular windowing systems for workstations and it uses the client/server model of communication ( Figure 2.6). The server manages the screen and input devices. It always runs on the machine that the user is sat at. Clients are application programs that the user has invoked and may be on a remote machine. It is regarded by many authors as a distributed system.

The server can be on an X terminal which does no local processing, or on a workstation which can also support clients. Other processors on the network can support clients.

Figure 2.6 An X client/server system

Clients communicate with the server by using procedures within the X library, which include creating and modifying windows. Since these procedures may be on a different machine (the server machine) to the client, then X uses the protocol of Remote Procedure Calls (RPC).

In principle, an X server can run upon any operating system. X servers exist for UNIX, DOS, Windows and Macintosh. X clients can also run upon any operating system - even one that is not graphical.

Standard access control can be supplied by the xhost program which provides a rudimentary form of security. Machines can be specified to xhost so that they may invoke connections to the server. It works best on a single user workstation platform, where all clients run either on workstation or a remote host with trusted users.

On a multi-user workstation, xhost provides little security. UNIX workstations suffer most since a rogue user could log in to a local workstation from a remote system, run xhost to allow their remote machine access and then make an X connection. The types of clients they could run could be quite damaging (screen lock) or invading (screen grab programs).

More advanced access control can be implemented using the X authority database. Within the database is a list of magic cookies (access codes) that clients must provide when connecting to a particular X server upon a particular host. If the client's host is remote to the local server's, then the magic cookie must be distributed. A utility exists called xrsh[7] which aids the automatic distribution of magic cookies. It is a simple script file that uses rsh to copy cookies to remote client machines.

Using magic cookies prevents casual X server hacking, but snooping could reveal the magic cookie as it is distributed to remote hosts. This snooping can be avoided if encryption of network traffic is used. Since the magic cookie is unique and consistent for all clients communicating to a X server, it is almost impossible to revoke access permissions once they have been given away. The only method is to disconnect all clients and generate a new magic cookie.

Although magic cookies can prevent a high level attack against an X server, it is possible to cause a crash if the hardware devices (for example the screen) are directly accessed. The following shell script is a very effective way of forcing a remote server (running upon a UNIX host) to crash by simply starting a second server upon the same host.

#!/bin/sh
if [ "$#" = "0" ]
then
echo usage: 'basename $0' hostname
exit 1
fi
rsh $1 "export XINITRC='bad.xinitrc' ; xinit ; clear_colormap"

The file "bad.xinitrc" should contain for example:-

xsetroot -solid Black -cursor_name pirate

which is enough to blank the screen and set the cursor pointer to a pirate. After this the original X server has been severed from all user IO. It is possible to prevent this type of hacking (by changing the access rights of certain devices exclusively to the user logged on at the console) but the systems' manager has to actively spot this security hole and patch it.

2.7 RPC

The best know RPC (Remote Procedure Call) system is Sun RPC [Sun90] which was originally designed for communication in the Sun NFS network file system. Within Sun RPC is XDR which originally specified external data representation, but now is a complete interface definition language. RPC typically is implemented upon TCP/UDP[8], but can use any network transport protocol.

Every remote procedure has a program and version number. Clients can request a program number and version. If the server supports that program number of the version required or later then the call proceeds. To the programmer the calls look like regular procedure calls. Externally a stub is called which locates the remote machine and then the port to access the remote procedure. The port is ascertained by calling a remote port mapper which exists at a well-known port. The hostname of the server must be specified by the client, so Sun RPC does not have location transparency.

When the parameters are sent they are converted to XDR format, transmitted and then converted back into the format required by the server. The reply follows a similar XDR translation. This means, for example, that machines that use different floating point number formats can communicate transparently using RPC.

Distributed windowing systems such as X use asynchronous RPC. Since screen updates often do not require synchronous replies, then several requests can be made. This allows the client to continue processing without waiting for replies. The server can better schedule requests since often no reply is required. This also minimises the effects of network latency and the communication loop required to send information.

Figure 2.7 Daisy Chain Call Semantics

RPC has `daisy chain' calling semantics ( Figure 2.7). When a server calls a second remote server, which calls a third and so on, then the reply has to be channelled through all the servers, even if they do no further processing. This, and the poor performance of RPC when it is scaled to networks with large latencies, is discussed in [Partridge92]. He suggests Late-Binding RPC, a technique where code is exported rather than parameters, and similar ideas can be found other systems.

2.8 Sockets, TCP, UDP & IP

Socket interfaces now exist for virtually every operating system that has been connected to a network; these include UNIX, DOS, Windows and MacOS. Sockets were introduced into BSD UNIX version 4 and are implemented as a set of systems calls that provide a simple communication interface to TCP and UDP. Together TCP and UDP form the communication backbone of many protocols, for example SMTP (email), NNTP (network news), FTP (remote file transfer), Telnet (remote login), Gopher and the World Wide Web (information services).

A socket address consists of an Internet port number and address. Clients specify the Internet address and port they which to communicate to and receive a socket description. Servers specify which Internet port they wish to be attached to and also receive a socket description. Sockets can be used to communicate or facilitate any of the previous protocols.

Sockets are just an abstraction of TCP/IP and UDP/IP and can not inherently migrate, nor be transparent. They provide a convenient interface to network programming, better protocols can be built on top of them and they are a useful tool in developing distributed systems.

2.9 Summary

The previous systems have shown how to connect and protect remote systems and resources. Mach, Chorus and Amoeba use capabilities and/or ports. Primarily this is to protect resources, but secondly they also provide a mechanism for connection. Figure 2.3 shows an example connection in Mach and this is similar in Amoeba and Chorus. Amoeba has advanced permissions which can be reduced in any new capability that is formed, not increased. In these systems there is a trade-off between ease of connection and protection of remote resources. There is no winning method for accessing resources, because no method can achieve all possible desirable requirements. At best the method used should be flexible to allow uniform access to resources and then any extra requirements can be constructed. Security should be enough to protect against direct attack and further security should be possible to build.

Clouds uses remote object invocation as its execution and connection model. There are no inter-object messages, apart from those which are by invocation. Sharing occurs via an object, so code and data may be situated on different computers. Objects may call other objects to fulfil an invocation request. This simplifies the connection model, but complex objects can become harder to code.

rsh shows a protection scheme based upon authority to connect (similar to capabilities), rather than possessing the necessary permissions (user name and password). Connection is provided by copying input and output from the remote to the local system. Automatic migration of permissions is not possible since an authority would need to be established first. This task can only be performed by hand and grows as the number of remote systems grow. Authority to connect is similar to capabilities and again shows drawbacks when examined in closer detail.

X's connections use a similar authority database to rsh and therefore is also similar to a capability. Protection is based around magic cookies, which can be automatically generated and distributed to remote systems. Unfortunately it is not practically possible to selectively revoke a magic cookies. It is shown that this only provides protection at the X message layer, since access to the underling devices is possible. This demonstrates how protection should work at all layers.

RPC is often used in distributed systems as the communication method between machines. It is reasonably fast and provides the XDR standard. Heterogeneous systems are built quicker and with fewer problems since translation of parameter formats is achieved automatically. Translation can become expensive, so many systems conform to the XDR standard internally anyhow. XDR has little impact on transmission of information apart from parameters, for instance picture and audio formats. RPC does not scale well to wide-area or global networks with their large latencies. Asynchronous and Late-Binding RPC are possible solutions but much of the work with RPC has been focused on speeding up local-area network performance (Firefly RPC [Schroeder90]).

TCP, UDP and IP have been around for quite a while and shall be well into the future. Sockets are a cheap and quick interface to these Internet protocols and exist on a variety of platforms, albeit under slightly different names. They provide a connection from one machine to another and nothing more. Any further requirements can be built on top.

Simplicity is the key to large scale distributed systems. Mach, Chorus, Amoeba, Clouds all work well in the situations they were designed for. X's magic cookies and rsh's authority are adequate within a university or business. RPC provides many excellent services, for example NFS. None scale to a large loosely coupled network.

There is no large scale loosely coupled distributed system to examine except for the Web - and that currently supplies information, not processing resources. Java code (see Chapter 1) executes upon the client and may communicate to servers but no integrated distribution infrastructure has been established. If there is a distribution system for large loosely coupled networks, then it will contain some properties of local area network systems that have been generalised to cope with the extreme environment of a network like the Internet.

3. Future Distribution Systems
for large loosely coupled networks

Chapters 1 and 2 have discussed what exists today, but what can be achieved in the future? The following sections contain a model and method for loosely coupled distribution. They are not the only possible solutions and are not necessarily the best. They are untried and novel.

3.1 Introduction

If loosely coupled distribution is to be used, then it will have to be reliable, secure, consistent and overall it must be efficient. These goals often are at odds with each other in certain distribution models. In many examples given in Chapter 2, the systems are modified to suit the goal - but this is difficult and rarely satisfying. A better approach is to change the model of distribution and a good example is Clouds.

3.2 Requirements, Transparency & Design Goals

The requirements of a loosely coupled distributed system are extremely important if the foundations are to be sound. Chapter 1 summarised all the requirement terminology used in real systems described in Chapter 2. The number of requirements and their type varies according to author and transparency is treated in a similar manner. The reason for this confusion is that requirements are interrelated to transparency and design goals.

Figure 3.1 shows a possible view of relationships, but there are others. In this model, transparency links the requirements of a distributed systems to the design goals. The model shows relationships laterally, not vertically. For example, the requirements sharing, scaling and openness have been grouped together. This is because they rely upon the same transparencies, not that they must have a relationship to each other.

Figure 3.1 Relationships within a distributed system

A noteworthy point is that the design goals apply equally to everything and can not be selectively applied. Without this, any system developed would be limited in its application.

Transparency has been categorised into network, failure and object transparency - with some overlap between them.


* Network transparency provides anonymity for resources. The location and access method are consistent regardless of the location of the resource.


* Failure transparency enables faults to be reliably concealed by using replication, migration and relocation of resources.


* Object transparency enables several processes to consistently act upon several resources as if there was a single object.

3.3 Communication

For any distributed system, it is the method of communication that ultimately defines its final shape and efficiency. Many use RPC, since it is quick, reliable and convenient. Clouds abstracts from RPC (although it does use it internally) and introduces objects which are similar to their programming equivalent. RPC systems suffer from `daisy chain' semantics and do not easily support concurrency. Clouds' objects do support concurrency because of the invocation semantics, but also suffer daisy chaining.

Daisy chaining is poor in a large scale distributed system since it will suffer unnecessary network latencies, which can be extremely large [Partridge92]. The goal is to reduce the number of network transits to an absolute minimum, so we require the invocation semantics shown in Figure 3.2.

Figure 3.2 Latency optimal invocation

To achieve this and to gain the maximum concurrent processing effect from a number of distinct systems, then pipelining is a possible method of satisfying this aim. Processing jobs are separated by a pipeline similar to a UNIX pipeline - except that the output does not `daisy chain' back through all the systems involved, but short circuits back to the parent process ( Figure 3.3).

Figure 3.3 Pipeline invocation

This is similar to `tail call semantics' [Partridge92] except that there is no anti-orphan messages and pipelining is the standard processing structure rather than the exception. To prevent orphans and zombies, every pipeline is allocated a time quota to complete by its parent. The size of this quota can be the same as the parent pipeline has, or less. This allows a parent to relocate a pipeline if a child does not finish within quota. It also prevents orphan and zombie jobs, since they will be terminated once the time quota has been exceeded.

The pipeline definition is exported to a new system if the current system can not implement the next task in the pipeline. This is different to REV [Stamos86,90a,90b] or to Late-Binding RPC [Partridge92] where the code is exported. Migration of the pipeline only when the next stage must be done on another system, minimises the network latency cost[9].

Since the pipeline communicates data then data can be trickled through, rather than batched, and separate systems can form stages in the pipeline similar to that found inside a processor [Chatterjee95]. There is no fixed requirement for the data to be any form, it could parameters, information or code. The only requirement is that it is delivered to the process specified by the pipeline.

3.4 Summary

Pipelines are a communication abstraction, just as RPC is an abstraction. Unlike RPC they require programmers to construct programs in an alien manner.

Pipelines could be a possible method of linking processing and information resources over a large unreliable network with high latencies. They have been tested in real-time tightly coupled systems [Chatterjee95] where their stream like nature improves throughput, but it is unknown if they will scale to larger networks.

4. Experimentation

Formal analysis is a plausible method of testing any new system, but it does require the parameters of the environment. Some research has occurred in the past, but little is know about the loads that users cause on current networks and processors.

Formalising about pipelines would be a difficult since their are so many unknowns. Direct observation is an obvious path to choose.

4.1 Introduction

Pipelining at first glance, seems to be a system that could work. Proving that is has spawned many other questions and many other experiments. Time quotas when used to allow for relocation of processing, has subtle features which are not immediately apparent. Users work on a shared systems and the system becomes loaded, but the reasons why it slows down are not fully understood. The next few sections examine these questions and offers some theories to explain the observed results.

4.2 Distributed Pipeline Simulator: Version 1
Pipelines & RPC

A simple pipeline and RPC simulator was coded by Mr Matt Lee which formed a MSc project that obtained a distinction from Loughborough University. Ray tracing was used as the task domain, since it was anticipated that this would highlight the performance difference between pipelining and RPC.

Each host and network in the simulated system had a particular speed and default loading, which could change over time. These were observed from real hosts (within Loughborough University) and networks (from all over the world) for approximately a week. The complete ray trace was modelled as a number of split operations, each followed by a ray trace and finally a single join operation. This gave an abstraction from concurrent pipelines which were called forked pipelines. The simulation of RPC conformed to daisy chain semantics.

No particular distribution algorithm was chosen, so that this would not bias the results. Random distribution was chosen to be the worst case and a `perfect' algorithm was invented. This could scan all the hosts and networks, without causing any loading, to find the optimal placement for a job. It was implement within the simulator by scanning all the data structures that defined hosts and networks and choosing the lowest predicted execution time.

Hosts and networks were made slower, time quotas were made smaller and the size of the ray trace made larger. Unsurprisingly pipelining bettered RPC, in terms of total execution time and ability to successfully complete the ray trace, when using random or perfect job placement. This was confirmation that pipelining had the essential qualities that suited it to a large loosely coupled network.

Interestingly it was noted that sometimes, whilst simulating pipelining under slow network conditions, that random bettered the so-called perfect placement algorithm. This was completely unexpected and unexplainable.

A second simulator has been coded, which is targeted to investigate the results from the first. Since the simulators estimate the performance of hosts executing jobs from real users, then these have been investigated also. These are discussed in the following sections.

4.3 Distributed Pipeline Simulator: Version 2
Random & perfect placement

To follow the results from first network simulator, another was built by Miss Sue Parr as part of a final year undergraduate project. This simulator was to concentrate upon testing placement algorithms, in particular the random and perfect algorithms used in the first simulator. To simplify the simulated system, a single network connecting a number of hosts was used as the basic topology. Again hosts and network in the simulated system had a particular speed and default loading, but now they could not change over time.

To make the host speeds as accurate as possible, a new survey of the UNIX hosts at Loughborough University was undertaken by Miss Parr. Timed processes were invoked throughout two days ( Figure 4.1) and the relative speeds of the hosts incorporated into the simulator.

Figure 4.1 Process times

The task domain was similar to ray tracing, except that the final join operation was discarded. This made the simulation simpler to code and does not loose any generality. Now random and perfect placement were under investigation, to ascertain why and by how much random bettered perfect placement. To make the simulated task increasing difficult to successfully complete, the time quota allocated to each concurrent pipeline (sub-time quota) was reduced whilst keeping all other variables constant. The results are shown in Figure 4.2.

Figure 4.2 Random & perfect placement results

This clearly shows random outperforming perfect placement in the 15 to 28 seconds period and confirms the observed results from the first simulator. A possible hypothesis is that since the network is the limiting resource, then random placement may be sharing it better than perfect. This is possible since the first simulator showed that perfect placement would unevenly load all the hosts in such a way that they operated at the same speed, regardless of their unloaded speed. Therefore since a large number of identical concurrent jobs are executing, then they all finish and communicate their results at the same time.

Random placement often evenly loads the machines, so they operate at different speeds. Jobs may be placed on hosts that have no chance of completing within the sub-time quota, but this is just enough to allow the more capable hosts to complete some jobs. To test this theory the essential ingredients of random placement needed to be incorporated into perfect. Back off, similar to that found on ethernet packet communication, was incorporated into the simulator. Now when a pipeline failed, rather than being redistributed immediately, it could be backed off (forced to wait) for a random amount of time.

Figure 4.3 Back off results

The results in Figure 4.3 show the modified perfect beating random placement. Both have suffered because of the introduction of back off. Before, some processing was always occurring, but now it is possible for there to be a large number of jobs backed off and not processing. Just how significant this result would be a real pipeline distributed system, and if it would have any effect at all, is hard to predict.

4.4 Loading Profile of a Distributed System

Even if a real distribution system existed, then the possible loading from real users would have to be known to test its performance and to optimise any distribution algorithms it may use. In 1986 Luis-Felipe Cabrera analysed process lifetimes in several UNIX installations [Cabrera86] hoping to find a better distribution scheme and possibly to discount some suggested by other authors.

Ten years have passed since this paper was published and no other author seems to have updated the results. This would require processing of the standard accounting files built into Unix and became a final year undergraduate project for Miss Georgina Barkley. Not all the results have been completed, but interestingly Miss Barkley's experiments ran on one of the same days as Miss Parr's on the same host (Suna).

Figure 4.4 Process creations

Figure 4.4 shows remarkable similarity to those produced in 1986. From this chart, it is possible to conclude that Unix hosts today are still subjected to similar process creation rates of ten years ago. Unfortunately Cabrera did not analyse CPU time consumption in a similar way ( Figure 4.5).

Figure 4.5 Mean CPU time consumed

There are still many more results to be published from the process accounting investigation that Miss Barkley undertook. A more complete survey, of a number of hosts over a number of days, is currently being prepared.

4.5 Integration of Experiments

Since Miss Parr and Miss Barkley have process loading and time data from the same host on the same day, it is possible to investigate if there is any link between them. Figure 4.6 shows Figure 4.1, Figure 4.4 and Figure 4.5 overlaid.

Figure 4.6 Process time, creations & CPU consumption

Figure 4.6 does not provide any mathematical proof of a relationship. It would seem reasonable that there is a direct linear relationship between the average of all the process lifetimes on a host in a period of time and the perceived execution time of a single user's process in the same period of time. Figure 4.7 shows that this is not always a safe assumption.

Figure 4.7 The relationship of CPU consumption & process time

A far better correlation in show in Figure 4.8 which would suggest that for this host on this day, that the number process creations and not the amount of CPU time consumed, was the primary reason for user processes to execute slower.

Figure 4.8 The relationship of process creations & process time

4.6 Future Work

Work is currently focusing upon time quotas in pipelines and concurrent pipelines. It is hoped that this work will throw light upon the difficult questions of distribution and scheduling. A paper is currently being constructed with Dr Colin Machin that describes a general system employing time quotas and highlights the problem areas. A general mathematical model is being developed to describe distribution effects over all distributed systems.

Future work is hoped to focus upon pipeline security, encryption and caching. Two more research papers are planned, the first documenting pipelines (and the pipeline simulator results) and the second the results from the host investigation that extend Cabrera's work.

The ultimate aim is to fully test a working pipeline system, by implementing only simple software and on limited types of hosts. Such a system could be used to develop further ideas for distributed computing. This will form a MSc project for Mr Scott Hume.

4.7 Summary

The previous experiments required significant processing and resources to compile but only scratch the surface of many questions. They have been conducted on limited hosts over limited days. A larger survey is required before any accurate predictions can be made about pipelining and the types of loading it may be exposed to. Simulation has gone as far as it can and empirical analysis suggests that an accurate formal model is impossible.

A working pipeline system would provide a interesting experiment, that would show how important the results from the simulators are in a real system. Currently such a system is in the design stage.

5. References

Acetta86.
M. Accetta, R. Baron, D. Golub, R.F. Rashid, A. Tevanian and M. Young, "Mach: A New Kernel Foundation for UNIX Development," Proc. Summer 1986 USENIX Conf. pp.93-112.

Boykin93.
J. Boykin, D. Kirschen, A. Langerman and S. LoVerso, "Programming Under Mach," Addison-Wesley, 1993.

Cabrera86.
L-F Cabrera, "The Influence of Workload on Load Balancing Strategies," Proc. 1986 Summer USENIX Conference, pp.446-458, USENIX Assoc. Atlanta, Georgia, 11-13 June 1986.

Chatterjee95.
Saurav Chatterjee and Jay Strosnider, "Distributed Pipeline Scheduling: A Framework for Distributed, Heterogeneous Real-Time System Design," Carnegie Mellon University, June 1995.

Coulouris94.
George Coulouris, Jean Dollimore and Tim Kindberg, "Distributed Systems Concepts and Design," Addison-Wesley, 1994.

Dasgupta91.
P. Dasgupta, R.J. LeBlanc Jr, M. Ahamad and U. Ramachandran, "The Clouds Distributed Operating System," IEEE Computer, vol.24, no.11, pp.34-44, 1991.

Fitzgerald86.
R. Fitzgerald and R.F. Rashid, "The integration of virtual memory management and interprocess communications in Accent," ACM Trans. Computer Systems, vol.4, no.2, pp.70-85, 1986.

ISO92.
International Standards Organisation, "Basic Reference Model of Open Distributed Processing, Part 1: Overview and guide use," ISO/IEC JTC1/SC212/WG7 CD 10746-1, 1992.

Loepere91.
K. Loepere, "Mach 3 Kernel Principles," Open Software Foundation and Carnegie-Mellon University, 1991.

Partridge92.
Craig Partridge, "Late-Binding RPC: A Paradigm for Distributed Computation in a Gigabit Environment," Harvard University, Mass., March 1992.

Rashid81.
R.F. Rashid and G. Robertson, "Accent: a communications oriented network operating system," ACM Operating Systems Review, vol.15, no.6, pp.64-75, 1981.

Rashid86.
R.F. Rashid, "From RIG to Accent to Mach: the evolution of a network operating system," Proc. ACM/IEEE Computer Society Fall Joint Conference, ACM, November 1986.

Rozier88.
M. Rozier, V. Abrosssimov, F. Armand, I. Boule, M. Gien, M. Guillemont, F. Herrman, C. Kaiser, S. Langlois, P. Leonard and W. Neuhauser, "Chorus Distributed Operating Systems," Computing Systems Journal, vol.1, no.4, pp.305-370, 1988.

Rozier90.
M. Rozier, V. Abrosssimov, F. Armand, I. Boule, M. Gien, M. Guillemont, F. Herrman, C. Kaiser, S. Langlois, P. Leonard and W. Neuhauser, "Overview of the Chorus Distributed Operating Systems," Tech. Report CS/TR-90-25.1, Chorus Systèmes, France, 1990.

Schroeder90.
M. Schroeder and M. Burrows, "The Performance of Firefly RPC," ACM Trans. Computer Systems, vol.8, no.1, pp.1-17, 1990.

Stamos86.
James W. Stamos, "Remote Evaluation," MIT,LCS,TR-354, MIT Lab for Computer Science, Cambridge, Mass., January 1986.

Stamos90a.
James W. Stamos and David K. Gifford, "Implementing Remote Evaluation," IEEE Trans. Software Engineering, vol.16, no.7, pp.710-722, July 1990.

Stamos90b.
James W. Stamos and David K. Gifford, "Remote Evaluation," ACM Trans. Programming Languages and Systems, vol.12, no.4, pp.537-565, October 1990.

Sun90.
Sun Microsystems Inc. "Network Programming," Sun Microsystems, Mountain View, CA. March 1990.

Tanenbaum90.
A.S. Tanenbaum, R. van Renesse, H. van Staveren, G. Sharp, S. Mullender, J. Jansen and G. van Rossum, "Experiences with the Amoeba Distributed Operating System," Comms. ACM, vol.33, no.12, pp.46-63, 1990.

Tanenbaum92.
A.S. Tanenbaum, "Modern Operating Systems," Englewood Cliffs NJ, Prentice Hall, 1992.

Tanenbaum95.
A.S. Tanenbaum, "Distributed Operating Systems," Englewood Cliffs NJ, Prentice Hall, 1995.

Footnotes

[1]Byte: Wired on the Web, January 1996 & http://java.sun.com.

[2]Byte: Inside the Web PC, March 1996.

[3]ISO reference model identifies eight [ISO92].

[4]HPUX has a similar command called remsh.

[5]Snooping is eavesdropping a network communication. The technique is to grab raw communication packets from the network and read any plain text that is encoded within them, which could be a password.

[6]The code is available at http://karlperkins.com/academic/source/new_rsh.c

[7]xrsh was written by James J Dempsey <jjd@spserv.bbn.com> and is part of the standard X distribution.

[8]TCP and UDP form the transport layer, and together with IP (Internet protocol), are the Internet protocols.

[9]The original proof for this theorem [Partridge92] used threads, but they are computationally equivalent to pipelines.