Skip to content Skip to sidebar Skip to footer

Dataflow Specifications Evaluate the Output Continuously

Data Flow Graph

Program Design and Analysis

Marilyn Wolf , in Computers as Components (Third Edition), 2012

5.3.1 Data Flow Graphs

A data flow graph is a model of a program with no conditionals. In a high-level programming language, a code segment with no conditionals—more precisely, with only one entry and exit point—is known as a basic block. Figure 5.4 shows a simple basic block. As the C code is executed, we would enter this basic block at the beginning and execute all the statements.

Figure 5.4. A basic block in C.

Before we are able to draw the data flow graph for this code we need to modify it slightly. There are two assignments to the variable x—it appears twice on the left side of an assignment. We need to rewrite the code in single-assignment form, in which a variable appears only once on the left side. Because our specification is C code, we assume that the statements are executed sequentially, so that any use of a variable refers to its latest assigned value. In this case, x is not reused in this block (presumably it is used elsewhere), so we just have to eliminate the multiple assignment to x. The result is shown in Figure 5.5 where we have used the names x1 and x2 to distinguish the separate uses of x.

Figure 5.5. The basic block in single-assignment form.

The single-assignment form is important because it allows us to identify a unique location in the code where each named location is computed. As an introduction to the data flow graph, we use two types of nodes in the graph—round nodes denote operators and square nodes represent values. The value nodes may be either inputs to the basic block, such as a and b, or variables assigned to within the block, such as w and x1. The data flow graph for our single-assignment code is shown in Figure 5.6. The single-assignment form means that the data flow graph is acyclic—if we assigned to x multiple times, then the second assignment would form a cycle in the graph including x and the operators used to compute x. Keeping the data flow graph acyclic is important in many types of analyses we want to do on the graph. (Of course, it is important to know whether the source code actually assigns to a variable multiple times, because some of those assignments may be mistakes. We consider the analysis of source code for proper use of assignments in Section 5.5.)

Figure 5.6. An extended data flow graph for our sample basic block.

The data flow graph is generally drawn in the form shown in Figure 5.7. Here, the variables are not explicitly represented by nodes. Instead, the edges are labeled with the variables they represent. As a result, a variable can be represented by more than one edge. However, the edges are directed and all the edges for a variable must come from a single source. We use this form for its simplicity and compactness.

Figure 5.7. Standard data flow graph for our sample basic block.

The data flow graph for the code makes the order in which the operations are performed in the C code much less obvious. This is one of the advantages of the data flow graph. We can use it to determine feasible reorderings of the operations, which may help us to reduce pipeline or cache conflicts. We can also use it when the exact order of operations simply doesn't matter. The data flow graph defines a partial ordering of the operations in the basic block. We must ensure that a value is computed before it is used, but generally there are several possible orderings of evaluating expressions that satisfy this requirement.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123884367000052

Program Design and Analysis

Marilyn Wolf , in Computers as Components (Fourth Edition), 2017

5.3 Models of programs

In this section, we develop models for programs that are more general than source code. Why not use the source code directly? First, there are many different types of source code—assembly languages, C code, and so on—but we can use a single model to describe all of them. Once we have such a model, we can perform many useful analyses on the model more easily than we could on the source code.

Our fundamental model for programs is the control/data flow graph (CDFG). (We can also model hardware behavior with the CDFG.) As the name implies, the CDFG has constructs that model both data operations (arithmetic and other computations) and control operations (conditionals). Part of the power of the CDFG comes from its combination of control and data constructs. To understand the CDFG, we start with pure data descriptions and then extend the model to control.

5.3.1 Data flow graphs

A data flow graph is a model of a program with no conditionals. In a high-level programming language, a code segment with no conditionals—more precisely, with only one entry and exit point—is known as a basic block. Fig. 5.4 shows a simple basic block. As the C code is executed, we would enter this basic block at the beginning and execute all the statements.

Figure 5.4. A basic block in C.

Before we are able to draw the data flow graph for this code, we need to modify it slightly. There are two assignments to the variable x—it appears twice on the left side of an assignment. We need to rewrite the code in single-assignment form, in which a variable appears only once on the left side. Because our specification is C code, we assume that the statements are executed sequentially, so that any use of a variable refers to its latest assigned value. In this case, x is not reused in this block (presumably it is used elsewhere), so we just have to eliminate the multiple assignment to x. The result is shown in Fig. 5.5 where we have used the names x1 and x2 to distinguish the separate uses of x.

Figure 5.5. The basic block in single-assignment form.

The single-assignment form is important because it allows us to identify a unique location in the code where each named location is computed. As an introduction to the data flow graph, we use two types of nodes in the graph—round nodes denote operators and square nodes represent values. The value nodes may be either inputs to the basic block, such as a and b, or variables assigned to within the block, such as w and x 1. The data flow graph for our single-assignment code is shown in Fig. 5.6. The single-assignment form means that the data flow graph is acyclic—if we assigned to x multiple times, then the second assignment would form a cycle in the graph including x and the operators used to compute x. Keeping the data flow graph acyclic is important in many types of analyses we want to do on the graph. (Of course, it is important to know whether the source code actually assigns to a variable multiple times, because some of those assignments may be mistakes. We consider the analysis of source code for proper use of assignments in Section 5.5.)

Figure 5.6. An extended data flow graph for our sample basic block.

The data flow graph is generally drawn in the form shown in Fig. 5.7. Here, the variables are not explicitly represented by nodes. Instead, the edges are labeled with the variables they represent. As a result, a variable can be represented by more than one edge. However, the edges are directed and all the edges for a variable must come from a single source. We use this form for its simplicity and compactness.

Figure 5.7. Standard data flow graph for our sample basic block.

The data flow graph for the code makes the order in which the operations are performed in the C code much less obvious. This is one of the advantages of the data flow graph. We can use it to determine feasible reorderings of the operations, which may help us to reduce pipeline or cache conflicts. We can also use it when the exact order of operations simply does not matter. The data flow graph defines a partial ordering of the operations in the basic block. We must ensure that a value is computed before it is used, but generally there are several possible orderings of evaluating expressions that satisfy this requirement.

5.3.2 Control/data flow graphs

A CDFG uses a data flow graph as an element, adding constructs to describe control. In a basic CDFG, we have two types of nodes: decision nodes and data flow nodes. A data flow node encapsulates a complete data flow graph to represent a basic block. We can use one type of decision node to describe all the types of control in a sequential program. (The jump/branch is, after all, the way we implement all those high-level control constructs.)

Fig. 5.8 shows a bit of C code with control constructs and the CDFG constructed from it. The rectangular nodes in the graph represent the basic blocks. The basic blocks in the C code have been represented by function calls for simplicity. The diamond-shaped nodes represent the conditionals. The node's condition is given by the label, and the edges are labeled with the possible outcomes of evaluating the condition.

Figure 5.8. C code and its CDFG.

Building a CDFG for a while loop is straightforward, as shown in Fig. 5.9. The while loop consists of both a test and a loop body, each of which we know how to represent in a CDFG. We can represent for loops by remembering that, in C, a for loop is defined in terms of a while loop [Ker88]. This for loop

Figure 5.9. A while loop and its CDFG.

      for   (i   =   0;   i   <   N;   i++)   {

                    loop_body();

      }

is equivalent to

      i   =   0;

      while   (i   <   N)   {

                      loop_body();

                      i++;

}

Hierarchical representation

For a complete CDFG model, we can use a data flow graph to model each data flow node. Thus, the CDFG is a hierarchical representation—a data flow CDFG can be expanded to reveal a complete data flow graph.

An execution model for a CDFG is very much like the execution of the program it represents. The CDFG does not require explicit declaration of variables but we assume that the implementation has sufficient memory for all the variables. We can define a state variable that represents a program counter in a CPU. (When studying a drawing of a CDFG, a finger works well for keeping track of the program counter state.) As we execute the program, we either execute the data flow node or compute the decision in the decision node and follow the appropriate edge, depending on the type of node the program counter points on. Even though the data flow nodes may specify only a partial ordering on the data flow computations, the CDFG is a sequential representation of the program. There is only one program counter in our execution model of the CDFG, and operations are not executed in parallel.

The CDFG is not necessarily tied to high-level language control structures. We can also build a CDFG for an assembly language program. A jump instruction corresponds to a nonlocal edge in the CDFG. Some architectures, such as ARM and many VLIW processors, support predicated execution of instructions, which may be represented by special constructs in the CDFG.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128053874000054

VLSI Signal Processing

Surin Kittitornkun , Yu-Hen Hu , in The Electrical Engineering Handbook, 2005

DG/SFG Versus DFG

As definition 6, the DG with its functional DFG in each DG node can be perceived as a complete DG (Kung, 1988) or a DFG of a nested Do-loop algorithm. On the other hand, each DFG node describing a recurrent algorithm in Section 7.2.3 corresponds to a set of operations or functions in a processor.

During the space–time mapping, each DG node or computation index is allocated and scheduled to execute at a specific SFG node. In parallel, multiprocessor implementation of recurrent DFG can be obtained by scheduling and assignment strategies, such as the one proposed by Wang and Hu (1995). As a result, several DG nodes correspond to an SFG node (PE), while several DFG nodes can be assigned to the same processor.

In a fully systolic SFG, each edge is associated with register(s). The clock cycle time of each SFG node is determined by its critical path. The delay of the critical path is determined by the maximum computation delay on a zero-register path. On the other hand, a recurrent algorithm is constrained by its iteration period analogous to a clock period. Other than minimizing the iteration period or cycle time, both space–time mapping and multiprocessor implementation try to minimize the number of PEs/processors, respectively.

Due to some undesirable properties of SFG (e.g., long zero-delay link, broadcasting link, dimensionality or geometry) the following strategies have been exploited to reorganize the SFG in such a way that the result is more suitable for VLSI implementation.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780121709600500670

Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications

Venkat N. Gudivada , Kamyar Arbabifard , in Handbook of Statistics, 2018

5.1 TensorFlow

It is an open-source library, which uses data flow graph as its computational model. This model is especially well suited for neural networks-based machine learning. The data flow graph model makes it easy for distributing computation across CPUs and GPUs. TensorFlow is comprised of three components: TensorFlow API, TensorBoard, and TensorFlow Serving. Defining, training, and validated machine learning models is enabled by TensorFlow API. Though the API is implemented using C++, a Python interface to the API is also available. TensorBoard is used for analyzing, debugging, and visualizing data flow graph models. Lastly, TensorFlow Serving enables deployment of pretrained TensorFlow models.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0169716118300221

Parallel Computing

Ralf Ebner , Alexander Pfaffinger , in Advances in Parallel Computing, 1998

5 DISTRIBUTED EVALUATION OF THE DATA FLOW GRAPH

The runtime system distributes the generated nodes of the data flow graph dynamically according to so-called location functions. For example, the line

Mright , res2:= relax ( Mright , m ) on next host;

migrates the function node relax onto the host that is given by the location function next-host at runtime. For the programmer, location functions are special elementary functions that decide dynamically on the distribution of the generated graph nodes. Therefore, they may use information on the architecture and topology of the parallel system. We are currently working on the integration of an economically oriented load distribution system [1] into the FASAN system in order to make unbalanced (irregular) computations on adaptive unbalanced trees work with high parallel efficiency.

Only the location functions might have to be adapted to special topology requirements. The FASAN program itself and the data flow model remain independent of the topology and architecture of the parallel system to a large extend.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0927545298800464

Parallel Computing

B. Stein , J. Chassin de Kergommeaux , in Advances in Parallel Computing, 1998

3.3 Filters

Filter components select or transform the data to be processed by subsequent components of the data-flow graph. There exists currently two filters in Pajé, one for selecting which entities are to be shown, based on their types, and another for re-using the space made available by the termination of threads (see figures 4(a) and 4(b)). This second filter also permits to group the information relative to several objects. Using this filter component, it is possible to visualize the execution of a program at the node level, which can be useful to check rapidly if some nodes are idle or not (see figure 4(c)). Being able to switch from detailed to grouped visualization gives programmers a zooming capability.

Figure 4. Examples of filters

(a) unfiltered; (b) reusing the space of terminated threads; (c) node grouping

In Pajé, filters produce filtering compound objects that act like ordinary compound objects (see figure 5). When a module queries a filtering compound object for the data of some elementary object, it gets the data from the original compound object, filters them, and gives the filtered data to the querying module. This way, a filter does not generate a new object for each elementary object, nor does it alter the elementary objects. Filters developed this way are easily cascaded. Other possible implementations of filtering would result either in duplication or modification of the elementary objects of the current window, the former resulting in high memory consumption and the later being unsuitable for the components using unfiltered objects.

Figure 5. Interaction of a filter component with the compound object

The Kiviat component receives the compound object generated by the compounding component, and accesses through it the unmodified data of the elementary objects. The Space/time component receives the filtering compound object, that intercepts all its messages to the compound object, providing it with a filtered view of the data.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0927545298800385

DSP Integrated Circuits

Lars Wanhammar , in DSP Integrated Circuits, 1999

Data-Flow Approach

One approach is to partition the system along the data-flow in the system. If the data-flow graph is drawn with data flowing from left to right we can define vertical and horizontal partitioning as illustrated in Figure 1.12. The former partitions the system into parts that pass data in a sequential manner while the latter partitions the system into parts where data flow in parallel paths [5].

Figure 1.12. (a) Vertical and (b) horizontal partitioning.

The vertical partitioning leads to a sequential system. Such systems can be pipelined so that the subsystems (processors) execute concurrently and pass data sequentially. The horizontal partitioning leads to a set of subsystems working in parallel. The subsystems can be autonomous and need not be synchronized since they do not interchange data. In practice it may not be possible to partition a system in a purely vertical or a purely horizontal style. For example, systems with feedback loops can not be partitioned in this way.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780127345307500015

Cognitive Computing: Theory and Applications

V.N. Gudivada , ... D.L. Rao , in Handbook of Statistics, 2016

4.6 Libraries and Frameworks

Many libraries and frameworks are available for developing cognitive analytics applications. TensorFlow is an open source software library from Google for numerical computation using data flow graphs ( Abadi et al., 2016). The library is optimized for execution on clusters and GPU processors. Among many other applications, TensorFlow is a deep-learning platform for computational biologists (Rampasek and Goldenberg, 2016).

Apache Singa is a general purpose, distributed neural platform for training deep-learning models over large datasets. The supported neural models include convolutional neural networks, restricted Boltzmann machines, and recurrent neural networks.

Torch7, Theano, and Caffe are the other deep-learning frameworks which are widely used. The Torch is a GPU-based scientific computing framework with wide support for machine learning algorithms. It provides an easy to use and fast scripting language called LuaJIT, which is implemented using the C language and CUDA. It comes with a large number of community-developed packages for computer vision, signal processing, and machine learning.

Theano is a Python library which is highly suited for large-scale, computationally intensive scientific investigations. Mathematical expressions on large multidimensional arrays can be efficiently evaluated. It tightly integrates with Numpy. Access to the underlying GPU hardware is transparent. Also, it performs efficient symbolic differentiation. Lastly, extensive unit-testing and self-verification functions are integrated into Theano, which enables diagnosing several types of errors in code.

Caffe is particularly suitable for convolutional neural networks and provides options for switching between CPUs and GPUs through configuration parameters. It has been stated that Caffe can process over 60 million images per day with a single Nvidia K40 GPU.

Massive Online Analysis (MOA) is a popular framework for data stream mining. The machine learning algorithms provided by the framework are suitable for tasks such as classification, regression, clustering, outlier detection, concept drift detection, and recommender systems.

MLlib is Apache Spark's machine learning library. Tasks that can be performed using the MLlib include classification, regression, clustering, collaborative filtering, and dimensionality reduction. mlpack is a C++-based machine learning library, which can be used through command line as well as C++ classes.

Pattern is a web mining module for the Python programming language. It features tools for data mining, natural language processing, clustering, network analysis, and visualization. Scikit-learn is another Python framework for machine learning, which is implemented using NumPy, SciPy, and matplotlib. Using the included machine learning algorithms, tasks such as clustering, classification, and regression can be accomplished.

Shogun is one of the oldest machine learning libraries, which is written in C++. However, it provides bindings for other languages such as Java, Python, C#, Ruby, R, Lua, Octave, and Matlab. Veles is a C++, distributed platform for developing deep-learning applications. Trained models can be exposed through REST API. Using Vales, widely recognized neural topologies such as fully connected, convolutional, and recurrent networks can be trained. Deeplearning4J, neon, and H2O are other libraries for deep learning.

Mahout is an Apache machine learning project. Mahout library is especially suited for execution on cluster computers and GPUs. Also, it tightly integrates with Hadoop Map/Reduce distributed processing framework. Logistic regression classifier, random forest decision trees, K-means clustering, and naive Bayes classifier algorithms are available in Mahout. Apache R project is a sophisticated platform for statistical computing. It features a comprehensive set of machine learning and visualization algorithms.

Amazon Machine Learning is a cloud-hosted service for creating machine learning models without knowing the internal details of machine learning algorithms. This service provides easy access to the data stored in Amazon S3, Redshift, and RDS. Azure ML Studio is a similar service from Microsoft.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0169716116300517

System-Level Design and Hardware/Software Co-design

Marilyn Wolf , in High-Performance Embedded Computing (Second Edition), 2014

7.2.1 High-level synthesis

Goals of high-level synthesis

High-level synthesis starts from a behavioral description of hardware and creates a register-transfer design. High-level synthesis schedules and allocates the operations in the behavior as well as maps those operations into component libraries.

Data flow graph, data dependency, variable function unit

Figure 7.1 shows a simple example of a high-level specification and one possible register-transfer implementation. The data dependency edges carry variable values from one operator or from the primary inputs to another operator or a primary output.

FIGURE 7.1. An example of behavior specification and register-transfer implementation.

The register-transfer implementation shows which steps need to be taken to turn the high-level specification, whether it is given as text or a data flow graph, into a register-transfer implementation:

Operations have been scheduled to occur on a particular clock cycle.

Variables have been assigned to registers.

Operations have been assigned to function units.

Some connections have been multiplexed to save wires.

Control step, time step

In this case, it has been assumed that we can execute only one operation per clock cycle. A clock cycle is often called a control step or time step in high-level synthesis. We use a coarser model of time for high-level synthesis than is used in logic synthesis. Because we are farther from the implementation, delays cannot be predicted as accurately, so detailed timing models are not of much use in high-level synthesis. Abstracting time to the clock period or some fraction of it makes the combinatorics of scheduling tractable.

Allocating variables to registers must be done with care. Two variables can share a register if their values are not required at the same time—for example, if one input value is used only early in a sequence of calculations and another variable is defined only late in the sequence. But if the two variables are needed simultaneously they must be allocated to separate registers.

Sharing either registers or function units requires adding multiplexers to the design. For example, if two additions in the data flow graph are allocated to the same adder unit in the implementation, we use multiplexers to feed the proper operands to the adder. The multiplexers are controlled by a control finite-state machine (FSM), which supplies the select signals to the muxes. (In most cases, we don't need demultiplexers at the outputs of shared units because the hardware is generally designed to ignore values that aren't used on any given clock cycle.) Multiplexers add three types of cost to the implementation:

1.

Delay, which may stretch the system clock cycle.

2.

Logic, which consumes area on the chip.

3.

Wiring, which is required to reach the multiplexer, also requires area.

Technology library

Sharing hardware isn't always a win. For example, in some technologies, adders are sufficiently small that you gain in both area and delay by never sharing an adder. Some of the information required to make good implementation decisions must come from a technology library, which gives the area and delay costs of some components. Other information, such as wiring cost estimates, can be made algorithmically. The ability of a program to accurately measure implementation costs for a large number of candidate implementations is one of the strengths of high-level synthesis algorithms.

Scheduling terminology

When searching for a good schedule, the as-soon-as-possible (ASAP) and as-late-as-possible (ALAP) ones are useful bounds on schedule length. Some of the scheduling algorithms discussed in Section 4.2.2 are also useful for high-level synthesis.

FCFS scheduling

A very simple heuristic that can handle constraints is first-come-first-served (FCFS) scheduling. FCFS walks through the data flow graph from its sources to its sinks. As soon as it encounters a new node, it tries to schedule that operation in the current clock schedule; if all the resources are occupied, it starts another control step and schedules the operation there. FCFS schedules generally handle the nodes from source to sink, but nodes that appear at the same depth in the graph can be scheduled in arbitrary order. The quality of the schedule, as measured by its length, can change greatly depending on exactly which order the nodes at a given depth are considered.

Critical-path scheduling

FCFS, because it chooses nodes at equal depth arbitrarily, may delay a critical operation. An obvious improvement is a critical-path scheduling algorithm, which schedules operations on the critical path first.

List scheduling

List scheduling is an effective heuristic that tries to improve on critical-path scheduling by providing a more balanced consideration of off-critical-path nodes. Rather than treat all nodes that are off the critical path as equally unimportant, list scheduling estimates how close a node is to being critical by measuring D, the number of descendants the node has in the data flow graph. A node with few descendants is less likely to become critical than another node at the same depth that has more descendants.

List scheduling also traverses the data flow graph from sources to sinks, but when it has several nodes at the same depth vying for attention, it always chooses the node with the most descendants. In our simple timing model, where all nodes take the same amount of time, a critical node will always have more descendants than any noncritical node. The heuristic takes its name from the list of nodes currently waiting to be scheduled.

Force-directed scheduling

Force-directed scheduling [Pau89] is a well-known scheduling algorithm that tries to minimize hardware cost to meet a particular performance goal by balancing the use of function units across cycles. The algorithm selects one operation to schedule using forces as shown in Figure 7.2. It then assigns a control step to that operation. Once an operation has been scheduled, it does not move, so the algorithm's outer loop executes once for each operation in the data flow graph.

FIGURE 7.2. How forces guide operator scheduling.

To compute the forces on the operators, we first need to find the distributions of various operations in the data flow graph, as represented by a distribution graph. The ASAP and ALAP schedules tells us the range of control steps at which each operation can be scheduled. We assume that each operation has a uniform probability of being assigned to any feasible control step. A distribution graph shows the expected value of the number of operators of a given type being assigned to each control step, as shown in Figure 7.3. The distribution graph gives us a probabilistic view of the number of function units of a given type (adder in this case) that will be required at each control step. In this example, there are three additions, but they cannot all occur on the same cycle.

FIGURE 7.3. Distribution graphs for force-directed scheduling.

If we compute the ASAP and ALAP schedules, we find that + 1 must occur in the first control step, + 3 in the last, and + 2 addition can occur in either of the first two control steps. The distribution graph DG+(t) shows the expected number of additions as a function of control step; the expected value at each control step is computed by assuming that each operation is equally probable at every legal control step.

We build a distribution for each type of function unit that we will allocate. The total number of function units required for the data path is the maximum number needed for any control step, so minimizing hardware requirements requires choosing a schedule that balances the need for a given function unit over the entire schedule length. The distribution graphs are updated each time an operation is scheduled—when an operation is assigned to a control step, its probability at that control step becomes 1 and at any other control step 0. As the shape of the distribution graph for a function unit changes, force-directed scheduling tries to select control steps for the remaining operations, which keeps the operator distribution balanced.

Force-directed scheduling calculates forces like those exerted by springs to help balance the utilization of operators. The spring forces are a linear function of displacement, as given by Hooke's Law:

(EQ 7.1) F ( x ) = k x

where x is the displacement and k is the spring constant, which represents the spring's stiffness. When computing forces on the operator we are trying to schedule, we first choose a candidate schedule time for the operator, and then compute the forces to evaluate the effects of that scheduling choice on the allocation.

There are two types of forces applied by and on operators: self-forces and predecessor/successor forces. Self-forces are designed to equalize the utilization of function units across all control steps. Since we are selecting a schedule for one operation at a time, we need to take into account how fixing that operation in time will affect other operations, either by pulling forward earlier operations or pushing back succeeding ones. When we choose a candidate time for the operator being scheduled, restrictions are placed on the feasible ranges of its immediate predecessors and successors. (In fact, the effects of a scheduling choice can ripple through the whole data flow graph, but this approximation ignores effects at a distance.)

The predecessor/successor forces, P o (t) and X o (t), are those imposed on predecessor/successor operations. The scheduling choice is evaluated based on the total forces in the system exerted by this scheduling choice: the self-forces, the predecessor forces, and the successor forces are all added together. That is, the predecessor and successor operators do not directly exert forces on the operator being scheduled, but the forces exerted on them by that scheduling choice help determine the quality of the allocation. At each step, we choose the operation with the lowest total force to schedule and place it at the control step at which it feels the lowest total force.

Path-based scheduling

Path-based scheduling [Cam91] is another well-known scheduling algorithm for high-level synthesis. Unlike the previous methods, path-based scheduling is designed to minimize the number of control states required in the implementation's controller, given constraints on data path resources. The algorithm schedules each path independently, using an algorithm that guarantees the minimum number of control states on each path. The algorithm then optimally combines the path schedules into a system schedule. The schedule for each path is found using minimum clique covering; this step is known by the name as-fast-as-possible (AFAP) scheduling.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780124105119000071

Professor Solomon

Robert Charles Metzger , in Debugging by Thinking, 2004

4.2.7 Look for domestic drift

"Many objects do have a designated or customary place where they are kept. But the reality is that they aren't always returned there. Instead, they are left wherever last used."

The customary place for a bug to occur is the last place that was modified. The place where an incorrect value is created is often not the place that it's observed.

Defective values tend to drift down the data-flow graph. A data-flow graph is a collection of arcs and nodes in which the nodes are either places where variables are assigned or used, and the arcs show the relationship between the places where a variable is assigned and where the assigned value is subsequently used.

To find the source of a defective value that has drifted down the dataflow graph, work backward from the manifestation to the definitions. The difficulty of performing this analysis depends on the scope of the variable. If it's a local variable on the stack, your search can be more limited. If it's a global variable, you may have to review many procedures to develop a graph that shows the chain of values.

There are several ways to develop a data-flow graph. If you have a compiler or tool that generates cross-reference tables, it will do much of the dirty work for you. Failing that, a simple text search with a tool like the UNIX™ command grep can help you identify the places where a variable is assigned or used. A slicing tool, explained in Chapter 14, is most helpful.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781555583071500045

roneycands1987.blogspot.com

Source: https://www.sciencedirect.com/topics/computer-science/data-flow-graph

Post a Comment for "Dataflow Specifications Evaluate the Output Continuously"