You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.developers.md
+73-24Lines changed: 73 additions & 24 deletions
Original file line number
Diff line number
Diff line change
@@ -1272,47 +1272,96 @@ make CMAKE_PARAMS="-DVTR_IPO_BUILD=off" -j8 vpr
1272
1272
1273
1273
# Profiling VTR
1274
1274
1275
-
1. Install `gprof`, `gprof2dot`, and `xdot`. Specifically, the previous two packages require python3, and you should install the last one with `sudo apt install`for all the dependencies you will need for visualizing your profile results.
1275
+
## Use GNU Profiler gprof
1276
+
1277
+
1. **Installation**: Install `gprof`, `gprof2dot`, and `xdot` (optional).
1278
+
1. `gprof` is part of [GNU Binutils](https://www.gnu.org/software/binutils/), which is a commonly-installed package alongside the standard GCC package on most systems. `gprof` should already exist. If not, use `sudo apt install binutils`.
1279
+
2. `gprof2dot` requires python3 or conda. You can install with `pip3 install gprof2dot` or `conda install -c conda-forge gprof2dot`.
1280
+
3. `xdot` is optional. To install it, use `sudo apt install`.
1276
1281
```
1277
-
pip3 install gprof
1282
+
sudo apt install binutils
1278
1283
pip3 install gprof2dot
1279
-
sudo apt install xdot
1284
+
sudo apt install xdot# optional
1280
1285
```
1281
1286
1282
1287
Contact your administrator if you do not have the `sudo` rights.
1283
1288
1284
-
2. Use the CMake option below to enable VPR profiler build.
1289
+
2. **VPR build**: Use the CMake option below to enable VPR profiler build.
1285
1290
```
1286
1291
make CMAKE_PARAMS="-DVTR_ENABLE_PROFILING=ON" vpr
1287
1292
```
1288
1293
1289
-
3. With the profiler build, each time you run the VTR flow script, it will produce an extra file `gmon.out` that contains the raw profile information.
1290
-
Run `gprof` to parse this file. You will need to specify the path to the VPR executable.
1294
+
3. **Profiling**:
1295
+
1. With the profiler build, each time you run the VTR flow script, it will produce an extra file `gmon.out` that contains the raw profile information. Run `gprof` to parse this file. You will need to specify the path to the VPR executable.
1296
+
```
1297
+
gprof $VTR_ROOT/vpr/vpr gmon.out > gprof.txt
1298
+
```
1299
+
1300
+
2. Next, use `gprof2dot` to transform the parsed results to a `.dot` file (Graphviz graph description), which describes the graph of your final profile results. If you encounter long functionnames, specify the `-s` option for a cleaner graph. For other useful options, please refer to its [online documentation](https://github.com/jrfonseca/gprof2dot?tab=readme-ov-file#documentation).
1301
+
```
1302
+
gprof2dot -s gprof.txt > vpr.dot
1303
+
```
1304
+
1305
+
- Note: You can chain the above commands to directly produce the `.dot` file:
4. Next, use `gprof2dot`to transform the parsed results to a `.dot` file, which describes the graph of your final profile results. If you encounter long functionnames, specify the `-s` option for a cleaner graph.
1335
+
2. **VPR build**: *No need*to enable any CMake options for using `perf`, unless you want to utilize specific features, such as `perf annotate`.
1296
1336
```
1297
-
gprof2dot -s gprof.txt >vpr.dot
1337
+
make vpr
1298
1338
```
1299
1339
1300
-
5. You can chain the above commands to directly produce the `.dot` file:
3. **Profiling**: `perf` needs to know the process ID (i.e., pid) of the running VPR you want to monitor and profile, which can be obtained using the Linux command`top -u <username>`.
1341
+
- **Option 1**: Real-time analysis
1342
+
```
1343
+
sudo perf top -p <vpr pid>
1344
+
```
1345
+
- **Option 2** (Recommended): Record and offline analysis
1346
+
1347
+
Use `perf record` to record the profile data and the call graph. (Note: The argument `lbr`for`--call-graph` only works on Intel platforms. If you encounter issues with call graph recording, please refer to the [`perf record` manual](https://perf.wiki.kernel.org/index.php/Latest_Manual_Page_of_perf-record.1) for more information.)
1348
+
```
1349
+
sudo perf record --call-graph lbr -p <vpr pid>
1350
+
```
1351
+
After VPR completes its run, or if you stop `perf` with CTRL+C (if you are focusing on a specific portion of the VPR execution), the `perf` tool will produce an extra file `perf.data` containing the raw profile results in the directory where you ran `perf`. You can further analyze the results by parsing this file using `perf report`.
1352
+
```
1353
+
sudo perf report -i perf.data
1354
+
```
1355
+
- Note 1: The official `perf` [wiki](https://perf.wiki.kernel.org/index.php/Main_Page) and [tutorial](https://perf.wiki.kernel.org/index.php/Tutorial) are highly recommended for those who want to explore more uses of the tool.
1356
+
- Note 2: It is highly recommended to run `perf` with `sudo`, but you can find a workaround [here](https://superuser.com/questions/980632/run-perf-without-root-rights) to allow running `perf` without root rights.
1357
+
- Note 3: You may also find [Hotspot](https://github.com/KDAB/hotspot) useful if you want to run `perf` with GUI support.
1358
+
1359
+
4. **Visualization** (optional): If you want a better illustration of the profiling results, first run the following command to transform the `perf` report into a Graphviz dot graph. The remaining steps are exactly the same as those described under [Use GNU Profiler gprof
Note that you can use the `-Gdpi` option to make your picture clearer if you find the default dpi settings not clear enough.
1316
1365
1317
1366
# External Subtrees
1318
1367
VTR includes some code which is developed in external repositories, and is integrated into the VTR source tree using [git subtrees](https://www.atlassian.com/blog/git/alternatives-to-git-submodule-git-subtree).
Copy file name to clipboardExpand all lines: doc/src/vpr/placement_constraints.rst
+33-3Lines changed: 33 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,14 @@ A Placement Constraints File Example
28
28
<add_atomname_pattern="n4917"/>
29
29
<add_atomname_pattern="n6010"/>
30
30
</partition>
31
+
32
+
<partitionname="Part2">
33
+
<add_regionx_low="3"y_low="3"x_high="85"y_high="85"/> <!-- When the layer is not explicitly specified, layer 0 is assumed. -->
34
+
<add_regionx_low="8"y_low="5"x_high="142"y_high="29 layer_low="0" layer_high="1"/> <!-- In 3D architectures, the region can span across multiple layers. -->
35
+
<add_region x_low="6" y_low="55" x_high="50" y_high="129 layer_low="2"layer_high="2"/> <!-- If the region only covers a non-zero layer, both layer_low and layer_high must be set the same value. -->
36
+
<add_atomname_pattern="n135"/>
37
+
<add_atomname_pattern="n7016"/>
38
+
</partition>
31
39
</partition_list>
32
40
</vpr_constraints>
33
41
@@ -75,7 +83,10 @@ The ``name_pattern`` can be the exact name of the atom from the input atom netli
75
83
Region
76
84
^^^^^^
77
85
78
-
An ``<add_region>`` tag is used to add a region to the partition. A ``region`` is a rectangular area on the chip. A partition can contain any number of independent regions - the regions within one partition must not overlap with each other (in order to ease processing when loading in the file). An ``<add_region>`` tag has the following attributes.
86
+
An ``<add_region>`` tag is used to add a region to the partition. A ``region`` is a rectangular area or cubic volume
87
+
on the chip. A partition can contain any number of independent regions - the regions within one partition **must not**
88
+
overlap with each other (in order to ease processing when loading in the file).
89
+
An ``<add_region>`` tag has the following attributes.
79
90
80
91
:req_param x_low:
81
92
The x value of the lower left point of the rectangle.
@@ -90,11 +101,30 @@ An ``<add_region>`` tag is used to add a region to the partition. A ``region`` i
90
101
The y value of the upper right point of the rectangle.
91
102
92
103
:opt_param subtile:
93
-
Each x, y location on the grid may contain multiple locations known as subtiles. This paramter is an optional value specifying the subtile location that the atom(s) of the partition shall be constrained to.
104
+
Each x, y location on the grid may contain multiple locations known as subtiles. This parameter is an optional value specifying the subtile location that the atom(s) of the partition shall be constrained to.
105
+
106
+
:opt_param layer_low:
107
+
The lowest layer number that the region covers. The default value is 0.
108
+
109
+
:opt_param layer_high:
110
+
The highest layer number that the region covers. The default value is 0.
94
111
95
112
The optional ``subtile`` attribute is commonly used when constraining an atom to a specific location on the chip (e.g. an exact I/O location). It is legal to use with larger regions, but uncommon.
96
113
97
-
If a user would like to specify an area on the chip with an unusual shape (e.g. L-shaped or T-shaped), they can simply add multiple ``<add_region>`` tags to cover the area specified.
114
+
In 2D architectures, ``layer_low`` and ``layer_high`` can be safely ignored as their default value is 0.
115
+
In 3D architectures, a region can span across multiple layers or be assigned to a specific layer.
116
+
For assigning a region to a specific non-zero layer, the user should set both ``layer_low`` and ``layer_high`` to the
117
+
desired layer number. If a layer range is to be covered by the region, the user set ``layer_low`` and ``layer_high`` to
118
+
different values.
119
+
120
+
If a user would like to specify an area on the chip with an unusual shape (e.g. L-shaped or T-shaped),
121
+
they can simply add multiple ``<add_region>`` tags to cover the area specified.
122
+
123
+
It is strongly recommended that different partitions do not overlap. The packing algorithm compares the number clustered
124
+
blocks and the number of physical blocks in a region to decide pack atoms inside a partition more aggressively when
125
+
there are not enough resources in a partition. Overlapping partitions causes some physical blocks to be counted in more
fprintf(Echo, "\tInput Connect Block Switch Name Within a Same Die: %s\n", arch->ipin_cblock_switch_name[ipin_cblock_switch_index_within_die].c_str());
239
-
239
+
240
240
//if there is more than one layer available, print the connection block switch name that is used for connection between two dice
241
-
for(constauto& layout : arch->grid_layouts){
241
+
for(constauto& layout : arch->grid_layouts){
242
242
int num_layers = (int)layout.layers.size();
243
-
if(num_layers > 1){
243
+
if(num_layers > 1){
244
244
fprintf(Echo, "\tInput Connect Block Switch Name Between Two Dice: %s\n", arch->ipin_cblock_switch_name[ipin_cblock_switch_index_between_dice].c_str());
0 commit comments