Understanding Efficiency

There are two basic concepts that limit the efficiency of a distributed system:

Latency: the time required to transmit a message from one location to another within a distributed system.

Bandwidth: the amount of data that can be transferred per unit of time in a stable state.

In addition there are two leading measures of efficiency that need to be included here:

Response Time: the time it takes to obtain a result after having submitted a job to be processed by the system.

Throughput: the system’s ability to handle high loads.  In other words, the number of tasks accomplished per unit time.

The goal of the distributed computing system is to have the best possible performance, in other words, to minimize the latency and response time while increasing the throughput.  In order to do so, there are certain optimizations that must be made to the system.  According to the Parallel & Distributed Computing Handbook, these optimizations can be reduced to three main changes to the processing system:

(1) Load Balancing  in order to obtain peak performance, every machine in the system needs to be assigned the same amount of work.  Because the different tasks do not necessarily all require the same amount of computation, work, or time, load balancing does not simply mean asking each machine to complete the name number of tasks.  Instead, there are two different approaches to balance the load of the computing system.  If the different task sizes are known in advance, the load can be balanced statically at compile time; if, however, the different task size are not known, the tasks are assigned to different processors dynamically at run time.

(2) Concurrency  all machines that are executing different portions of any computation, should, for optimal performance, be executing the various calculations simultaneously.  This is simply an optimization based on common sense, if there are two processors which are working together to complete some large calculation that must be combined and then processed once again, the third analysis, requires that the first two computations be completed before beginning.  Thus, if the two computing machines are running at the same time, the time spent waiting for one processor to finish computation while the other sits idly waiting to begin a new task will be minimized.

(3) Overhead  in order to increase the efficiency of the computing system, the amount of extra work done that is not required for the computation should be minimized.  Whenever a new variable is introduced to a given computing system, there will be the overhead of incorporating that new feature into the old system rather than simply starting a new system with functionality unique to the needs of that single variable.  In the case of distributed computing, one example is the need for machines to communicate with each other over the network in order to share results.  Thus, in order to gain the speed of added processors and computing resources, there will be some added expense in computing time.  If, however, this time is close to zero, the overhead has been minimized and the system’s response time will be greatly decreased.

Despite the attempts of distributed computing architecture developers to improve the efficiency of current systems, there are inherent limitations to the improvements that can be made in the three areas listed above.

Load balancing is severely limited by the granularity of the tasks to be completed.  For many applications, there is a limitation to the number of times a given task can be split; or, in other words, there is a point at which the task must be processed as a complete entity because the different steps within the task are interdependent.  In the case where the tasks are course (the can not be split many times before reaching the atomic level), it becomes harder to balance the load between different processors.  The task may differ in size significantly, but the larger tasks can not be decomposed and thus the load can not be evenly distributed between the different processors in the system.

When utilizing a distributed computing system, where all computation is simply executed while the user is not using the machine, it is impossible to have a completely concurrent system.  This is simply a question of practicality, there is no possibility that the optimal state be obtained where all computation is completely synchronized among all the computers in the distributed system.

Finally, there will always be some system overhead in a distributed computing system.  The different machines must talk to each other and the overhead becomes, as a result, a function of the latency and bandwidth of the system.  In addition, the different processors may have to repeat some of the calculations completed by other machines locally simply to complete their discreet tasks.  If these different tasks were completed on the same machine sequentially, these repetitive calculations may not have been necessary.

[Home] [What Is It?] [History] [Future] [Concerns] [Efficiency] [Curr. Projects] [Resources]