parallelization2.html 11.7 KB
Newer Older
Praetorius, Simon's avatar
Praetorius, Simon committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
<!DOCTYPE html>
<html>
  <head>
    <title>AMDiS - Adaptive Multi-Dimensional Simulations</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <style type="text/css">
      @import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
      @import url(https://fonts.googleapis.com/css?family=Raleway);
      @import url(https://fonts.googleapis.com/css?family=Ubuntu);
      @import url(https://fonts.googleapis.com/css?family=Droid+Serif:400,700,400italic);
      @import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic);
    </style>
    <!--<link rel="stylesheet" type="text/css" href="style_display.css" />-->
    <link rel="stylesheet" type="text/css" href="style_print.css" />
  </head>
  <body>
    <textarea id="source">

class: center, middle

# Session 11
## Parallelization: Praxis

---

# Example: Poisson equation

\\[
  -\Delta u = f,\quad u=0\text{ on }\partial\Omega
\\]

CMakeLists.txt: select parallel AMDiS
```cmake
34
find_package(AMDIS REQUIRED PARALLEL)
Praetorius, Simon's avatar
Praetorius, Simon committed
35 36

add_executable(ellipt src/ellipt.cc)
37
target_link_libraries(ellipt AMDiS)
Praetorius, Simon's avatar
Praetorius, Simon committed
38
```
39
- Links against `libamdis-p.so`
Praetorius, Simon's avatar
Praetorius, Simon committed
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382
- No changes in source file necessary!
- Requirements:
  - PETSc (`>=` 3.4)
  - ParMETIS (`>=` 3.1)

--

Run the program with
```bash
mpirun -np 2 ./ellipt init/ellipt.dat.2d
```
- Parallel AMDiS requires `>=` 2 processors

---

# Example: Poisson equation

Terminal output:
```bash
[0] ParallelProblemStat::initialize():
[0]                ParallelProblemStat::initialize()
[0]                Initialization phase 0 needed 0.00125 seconds
[0] MacroReader::checkMesh():
[0]                Checking mesh ...
[0]                checking done; no error detected
[0] Mesh::checkParallelMacroFile():
[0]                The macro mesh file "./macro/macro.stand.2d" was refined 3
                   times and stored to file "./macro/macro.stand.2d.707999.tmp".
[0] MacroReader::checkMesh():
[0]                Checking mesh ...
[0]                checking done; no error detected
[0] MeshDistributor::initParallelization():
[0]                Initialization phase 1 needed 0.00369 seconds
[0] MeshDistributor::updateDofRelatedStruct():
[0]                Update parallel data structures needed 0.00017 seconds
[0] MeshDistributor::updateLocalGlobalNumbering():
[0]                FE space 0 (mesh ptr 0xe1bec0):  nRankDofs    = 10
                   nOverallDofs = 25
[0]                Update dof mapping needed 0.00004 seconds
[0] MeshDistributor::initParallelization():
[0]                Init parallelization needed 0.02302 seconds
```

---

# Example: Poisson equation

Terminal output:
```bash
[0] StandardProblemIteration::beginIteration():
[0]
[0]                begin of iteration number: 0
[0]                =============================
[0] StandardProblemIteration::buildAndAdapt():
[0]                Local mesh adaption needed 0.00000 seconds
[0] ProblemStat::buildAfterCoarsen():
[0]                dof compression needed 0.00000 seconds
[0]                15 DOFs for FeSpace[0] (Lagrange1)
[0]                fillin of assembled matrix: 70
[0]                buildAfterCoarsen needed 0.00032 seconds
[0] LinearSolverInterface::solveSystem():
[0]                LinearSolverInterface::solveSystem()
[0] PetscSolver::solveLinearSystem():
[0]                creation of parallel data structures needed 0.05750 seconds
  Residual norms for  solve.
  0 KSP Residual norm 8.199750216692e-01
  1 KSP Residual norm 2.789152485779e-01
  # ...
 20 KSP Residual norm 1.049159448656e-08
 21 KSP Residual norm 4.582174575333e-17
[0] LinearSolverInterface::solveSystem():
[0]                Residual norm: ||b-Ax|| = 4.582175e-17
[0] Problem::solve():
[0]                solution of discrete system needed 0.05842 seconds
```

---

# Example: Poisson equation

Terminal output:
```bash
[0] ResidualEstimator::exit():
[0]                estimate for component 0 = 1.78957825e-01
[0] ProblemStat::estimate():
[0]                estimation of the error needed 0.00029 seconds
[0] StandardProblemIteration::endIteration():
[0]
[0]                end of iteration number: 0
[0]                =============================
[0] FileWriter<T>::writeFiles():
[0]                ParaView file written to output/ellipt_r0.2d-p0-.vtu
[0] Arh2Writer::detail::writeAux():
[0]                ARH file written to: output/ellipt_r0.2d-p0-.arh
[0] ProblemStat::writeFiles():
[0]                writeFiles needed 0.00461 seconds
```
- Output written to parallel files `filename-pX-.vtu`.
- Additional parallel collection `.pvtu` is created

---

# Example: Poisson equation

3 times refined macro mesh
.center[
<img src="images/poisson_parallel2.png" width="60%" />
]

---

# Controlling the parallelization

### Pre-refinement:
- Nr. of refinements before macro-mesh is partitioned.
- There should be at least 10 macro Elements per processor
- Can be controlled by parameter
```matlab
parallel->pre refine: X    % -1...automatic, 0...no pre-refinement
```
- Pre-refinement only, if *macro-weights* not given!
--


### Disable adaptivity
If there is no mesh adaptivity, the mesh distributor can remove some
data structures which are only used if mesh changes or it must be
redistributed due to some local adaptivity.
```matlab
parallel->mesh adaptivity: [0,1]  % default: 1
```

---

# Partitioning
- Partitioner can be selected by
```matlab
parallel->partitioner: [parmetis|zoltan|simple]
```
- The mesh partitioner can be used in two different modes:
  - *standard mode* (assigns macro elements to ranks)
  - *box partitioning* (in 2D boxed, i.e. 2 macro elements, and in 3D cubes, i.e. 6 macro
    elements, are assigned as a union to ranks). - only 3d implemented!
```matlab
parallel->box partitioning: [0|1]
```
--

- Initial partitioning:
  ```matlab
  parallel->partitioner->initial partitioning file: filename.abc
  ```
  Format:
  ```
  nr-of-macro-elements
  rank-of-macro-0
  rank-of-macro-1
  rank-of-macro-2
  ...
  ```
---

# Partitioning

- Partitioning based on ARH Meta information
  ```matlab
  parallel->partitioner->read arh: filename.arh
  ```
  - Requires ARH format `>=` 2
  - Only used for partitoning if `#proc` in ARH file corresponds to `MPI_Comm_size`

**Simple partitioner**: does not change the initial partitioning which is more
  or less a random assignment of elements to ranks.

**ZOLTAN partitioner**:
  - Read configuration for Zoltan:
```matlab
zoltan parameter->LB_APPROACH: [PARTITION|REPARTITION]
zoltan parameter->LB_METHOD: GRAPH
zoltan parameter->REFTREE_INITPATH: CONNECTED
zoltan parameter->REDUCE_DIMENSIONS: 1
zoltan parameter->DEGENERATE_RATIO: 1.1
zoltan parameter->RCB_RECTILINEAR_BLOCKS: 1
zoltan parameter->AVERAGE_CUTS: 1
zoltan parameter->RCB_RECOMPUTE_BOX: 1
zoltan parameter->OBJ_WEIGHT_DIM: 1
```

---

# Partitioning

### Weighted partitioning:
- Macro-elements can be weighted by the nr. of leaf elements
- Goal: each partition should have the same nr. of leaf elements
```matlab
[MESH_NAME]->macro weights: filename.abc
```
File format:
```matlab
macro-index-0  weight-0
macro-index-1  weight-1
macro-index-2  weight-2
...
```
.center[
<img src="images/mesh_weights.png" width="30%" />
]

---

# Partitioning

### Weighted partitioning:
- Outlook: face-weights?
    - Corresponds to communication costs.
    - e.g. nr. of DOFs on face.


.center[
<img src="images/mesh_weights2.png" width="40%" />
]

- Balance between two load values: element load, communication load
- Combination with communication speed between processors `->` hyper graph problem

---

# Repartitioning

If partitioning is not balanced (one processor has more DOFs than another one), the
mesh can/need to be repartitioned.

.center[
<img src="images/mesh_repartition.png" width="40%" />
]

---

# Repartitioning

If partitioning is not balanced (one processor has more DOFs than another one), the
mesh can/need to be repartitioned.

.center[
<img src="images/mesh_repartition.png" width="40%" /><img src="images/mesh_repartition2.png" width="40%" />
]

---

# Repartitioning

If partitioning is not balanced (one processor has more DOFs than another one), the
mesh can/need to be repartitioned.



```matlab
parallel->repartition: [false]  % If true, it is possible to repartition the
                                % mesh during computations.

parallel->repartition only once: [false]  % repartition the mesh (only) the first
                                          % time repartitionMesh() is called

parallel->repartition ith change: [20]   % Stores the number of mesh changes
                                         % that must lie in between two
                                         % repartitionings.
```

### Control the repartitioning
```matlab
parallel->repartitioning->imbalance: [0.2]  % upper bound for imbalancing factor
parallel->repartitioning->strategy: [0|1]   % 0..quick repartitioning (default)
                                            % 1..full repartitioning (more stable)
```
with
\\[
\text{imbalancingFactor} := \frac{maxDOFs}{avgDOFs} - 1
\\]

---

# Repartitioning
### Strategy *quick*
- creates a new partitioning of the macro-mesh
- collects all macro elements that need to be removed
- **send macro elements** to other processors
- modifies the current local macro mesh part, with respect to new partitioning
- meight work in most of the cases, but one can construct situation that always fail

### Strategy *full*
- Stores all macro-elements including refinement hierarchy and leaf values
- create a completely new mesh partitoning based on the old partitioning
- applys the refinement structure and values to all local macro elements
- **sends** refinement structure and **values** to owning processor
- should always work, but needs more memory to cache all data

---

# Repartitioning

- Call repartition manually:
```
bool MeshDistributor::repartitionMesh();
```

- Only those values are communicated/restored that are marked as `interchangeVectors`
```
bool MeshDistributor::addInterchangeVector(DOFVector<double>* vec);
```
or directly when constructing a DOFVector:
```
DOFVector(const FiniteElemSpace* feSpace, std::string name,
              bool setInterchange = false);
```
The `problemStat->solution` vector is automatically set to be an interchange vector.

---

# File I/O in parallel mode
Reading and writing files in parallel meight be a bit more tricky:
- ARH-3 format is designed to be flexible w.r.t. parallel input/output
- Basic idea:
  - Three ARH file types:
      1. `.arh` (sequential / local refinement structure and values)
      2. `.parh` (parallel: macro filename, partition map)
      3. `.tarh` (time series of `.arh` or `.parh` files, only times are stored,
        filenames in specified format, not yet implemented)
  - Only store structure codes for each macro element + ordered sequence of
    values on all DOFs of the macro element
  - Macro mesh is specified in various ways
      - If given in init-file, then use this
      - else filename is given in [p]arh file, relative to this file
      - Macrofile can be included in [p]arh file, or appended to any other file

- Start the simulation using an ARH file:
```matlab
[MESH_NAME]->arh file name: filename.arh
```


    </textarea>
    <script src="lib/remark.js" type="text/javascript"></script>
383
    <script type="text/javascript" src="MathJax/MathJax.js?config=TeX-AMS_HTML"></script>
Praetorius, Simon's avatar
Praetorius, Simon committed
384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406

    <script type="text/javascript">
      var slideshow = remark.create({
        ratio: "4:3",
        highlightLanguage: "cpp"
      });

      // Setup MathJax
      MathJax.Hub.Config({
          tex2jax: {
          skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
          }
      });
      MathJax.Hub.Queue(function() {
          $(MathJax.Hub.getAllJax()).map(function(index, elem) {
              return(elem.SourceElement());
          }).parent().addClass('has-jax');
      });

      MathJax.Hub.Configured();
    </script>
  </body>
</html>