@@ -35,13 +35,13 @@ mesh elements, and the values at quadrature points, respectively.
35
35
36
36
We refer to the operators that connect the different types of vectors as:
37
37
38
- - Subdomain restriction :math: `\mathbf {P}`
39
- - Element restriction :math: `\mathbf {G}`
40
- - Basis (Dofs-to-Qpts) evaluator :math: `\mathbf {B}`
41
- - Operator at quadrature points :math: `\mathbf {D}`
38
+ - Subdomain restriction :math: `\bm {P}`
39
+ - Element restriction :math: `\bm {G}`
40
+ - Basis (Dofs-to-Qpts) evaluator :math: `\bm {B}`
41
+ - Operator at quadrature points :math: `\bm {D}`
42
42
43
43
More generally, when the test and trial space differ, they get their own
44
- versions of :math: `\mathbf {P}`, :math: `\mathbf {G}` and :math: `\mathbf {B}`.
44
+ versions of :math: `\bm {P}`, :math: `\bm {G}` and :math: `\bm {B}`.
45
45
46
46
.. _fig-operator-decomp :
47
47
@@ -50,11 +50,11 @@ versions of :math:`\mathbf{P}`, :math:`\mathbf{G}` and :math:`\mathbf{B}`.
50
50
Operator Decomposition
51
51
52
52
Note that in the case of adaptive mesh refinement (AMR), the restrictions
53
- :math: `\mathbf {P}` and :math: `\mathbf {G}` will involve not just extracting sub-vectors,
53
+ :math: `\bm {P}` and :math: `\bm {G}` will involve not just extracting sub-vectors,
54
54
but evaluating values at constrained degrees of freedom through the AMR interpolation.
55
- There can also be several levels of subdomains (:math: `\mathbf {P1 }`, :math: `\mathbf {P2 }`,
56
- etc.), and it may be convenient to split :math: `\mathbf {D}` as the product of several
57
- operators (:math: `\mathbf {D1 }`, :math: `\mathbf {D2 }`, etc.).
55
+ There can also be several levels of subdomains (:math: `\bm {P1 }`, :math: `\bm {P2 }`,
56
+ etc.), and it may be convenient to split :math: `\bm {D}` as the product of several
57
+ operators (:math: `\bm {D1 }`, :math: `\bm {D2 }`, etc.).
58
58
59
59
60
60
Terminology and Notation
@@ -149,10 +149,10 @@ Operator representation/storage/action categories:
149
149
150
150
- CSR matrix on each rank
151
151
152
- - the parallel prolongation operator, :math: `\mathbf {P}`, (and its transpose) should use
152
+ - the parallel prolongation operator, :math: `\bm {P}`, (and its transpose) should use
153
153
optimized matrix-free action
154
154
155
- - note that :math: `\mathbf {P}` is the operator mapping T-vectors to L-vectors.
155
+ - note that :math: `\bm {P}` is the operator mapping T-vectors to L-vectors.
156
156
157
157
- Element matrix assembly, **EA **:
158
158
@@ -182,62 +182,62 @@ Operator representation/storage/action categories:
182
182
Partial Assembly
183
183
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
184
184
185
- Since the global operator :math: `\mathbf {A}` is just a series of variational restrictions
186
- with :math: `\mathbf {B}`, :math: `\mathbf {G}` and :math: `\mathbf {P}`, starting from its
187
- point-wise kernel :math: `\mathbf {D}`, a "matvec" with :math: `\mathbf {A}` can be
185
+ Since the global operator :math: `\bm {A}` is just a series of variational restrictions
186
+ with :math: `\bm {B}`, :math: `\bm {G}` and :math: `\bm {P}`, starting from its
187
+ point-wise kernel :math: `\bm {D}`, a "matvec" with :math: `\bm {A}` can be
188
188
performed by evaluating and storing some of the innermost variational restriction
189
189
matrices, and applying the rest of the operators "on-the-fly". For example, one can
190
190
compute and store a global matrix on **T-vector ** level. Alternatively, one can compute
191
191
and store only the subdomain (**L-vector **) or element (**E-vector **) matrices and
192
- perform the action of :math: `\mathbf {A}` using matvecs with :math: `\mathbf {P}` or
193
- :math: `\mathbf {P}` and :math: `\mathbf {G}`. While these options are natural for
192
+ perform the action of :math: `\bm {A}` using matvecs with :math: `\bm {P}` or
193
+ :math: `\bm {P}` and :math: `\bm {G}`. While these options are natural for
194
194
low-order discretizations, they are not a good fit for high-order methods due to
195
195
the amount of FLOPs needed for their evaluation, as well as the memory transfer
196
196
needed for a matvec.
197
197
198
198
Our focus in libCEED, instead, is on **partial assembly **, where we compute and
199
- store only :math: `\mathbf {D}` (or portions of it) and evaluate the actions of
200
- :math: `\mathbf {P}`, :math: `\mathbf {G}` and :math: `\mathbf {B}` on-the-fly.
199
+ store only :math: `\bm {D}` (or portions of it) and evaluate the actions of
200
+ :math: `\bm {P}`, :math: `\bm {G}` and :math: `\bm {B}` on-the-fly.
201
201
Critically for performance, we take advantage of the tensor-product structure of the
202
202
degrees of freedom and quadrature points on *quad * and *hex * elements to perform the
203
- action of :math: `\mathbf {B}` without storing it as a matrix.
203
+ action of :math: `\bm {B}` without storing it as a matrix.
204
204
205
205
Implemented properly, the partial assembly algorithm requires optimal amount of
206
206
memory transfers (with respect to the polynomial order) and near-optimal FLOPs
207
207
for operator evaluation. It consists of an operator *setup * phase, that
208
- evaluates and stores :math: `\mathbf {D}` and an operator *apply * (evaluation) phase that
209
- computes the action of :math: `\mathbf {A}` on an input vector. When desired, the setup
208
+ evaluates and stores :math: `\bm {D}` and an operator *apply * (evaluation) phase that
209
+ computes the action of :math: `\bm {A}` on an input vector. When desired, the setup
210
210
phase may be done as a side-effect of evaluating a different operator, such as a
211
211
nonlinear residual. The relative costs of the setup and apply phases are
212
212
different depending on the physics being expressed and the representation of
213
- :math: `\mathbf {D}`.
213
+ :math: `\bm {D}`.
214
214
215
215
216
216
Parallel Decomposition
217
217
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
218
218
219
219
After the application of each of the first three transition operators,
220
- :math: `\mathbf {P}`, :math: `\mathbf {G}` and :math: `\mathbf {B}`, the operator evaluation
221
- is decoupled on their ranges, so :math: `\mathbf {P}`, :math: `\mathbf {G}` and
222
- :math: `\mathbf {B}` allow us to "zoom-in" to subdomain, element and quadrature point
220
+ :math: `\bm {P}`, :math: `\bm {G}` and :math: `\bm {B}`, the operator evaluation
221
+ is decoupled on their ranges, so :math: `\bm {P}`, :math: `\bm {G}` and
222
+ :math: `\bm {B}` allow us to "zoom-in" to subdomain, element and quadrature point
223
223
level, ignoring the coupling at higher levels.
224
224
225
- Thus, a natural mapping of :math: `\mathbf {A}` on a parallel computer is to split the
225
+ Thus, a natural mapping of :math: `\bm {A}` on a parallel computer is to split the
226
226
**T-vector ** over MPI ranks (a non-overlapping decomposition, as is typically
227
227
used for sparse matrices), and then split the rest of the vector types over
228
228
computational devices (CPUs, GPUs, etc.) as indicated by the shaded regions in
229
229
the diagram above.
230
230
231
231
One of the advantages of the decomposition perspective in these settings is that
232
- the operators :math: `\mathbf {P}`, :math: `\mathbf {G}`, :math: `\mathbf {B}` and
233
- :math: `\mathbf {D}` clearly separate the MPI parallelism
234
- in the operator (:math: `\mathbf {P}`) from the unstructured mesh topology
235
- (:math: `\mathbf {G}`), the choice of the finite element space/basis (:math: `\mathbf {B}`)
236
- and the geometry and point-wise physics :math: `\mathbf {D}`. These components also
232
+ the operators :math: `\bm {P}`, :math: `\bm {G}`, :math: `\bm {B}` and
233
+ :math: `\bm {D}` clearly separate the MPI parallelism
234
+ in the operator (:math: `\bm {P}`) from the unstructured mesh topology
235
+ (:math: `\bm {G}`), the choice of the finite element space/basis (:math: `\bm {B}`)
236
+ and the geometry and point-wise physics :math: `\bm {D}`. These components also
237
237
naturally fall in different classes of numerical algorithms -- parallel (multi-device)
238
- linear algebra for :math: `\mathbf {P}`, sparse (on-device) linear algebra for
239
- :math: `\mathbf {G}`, dense/structured linear algebra (tensor contractions) for
240
- :math: `\mathbf {B}` and parallel point-wise evaluations for :math: `\mathbf {D}`.
238
+ linear algebra for :math: `\bm {P}`, sparse (on-device) linear algebra for
239
+ :math: `\bm {G}`, dense/structured linear algebra (tensor contractions) for
240
+ :math: `\bm {B}` and parallel point-wise evaluations for :math: `\bm {D}`.
241
241
242
242
Currently in libCEED, it is assumed that the host application manages the global
243
243
**T-vectors ** and the required communications among devices (which are generally
@@ -252,7 +252,7 @@ ranks (each using a single ``Ceed`` object): 2 ranks using 1 CPU socket each, an
252
252
4 using 1 GPU each. Another choice could be to run 1 MPI rank on the whole node
253
253
and use 5 ``Ceed `` objects: 1 managing all CPU cores on the 2 sockets and 4
254
254
managing 1 GPU each. The communications among the devices, e.g. required for
255
- applying the action of :math: `\mathbf {P}`, are currently out of scope of libCEED. The
255
+ applying the action of :math: `\bm {P}`, are currently out of scope of libCEED. The
256
256
interface is non-blocking for all operations involving more than O(1) data,
257
257
allowing operations performed on a coprocessor or worker threads to overlap with
258
258
operations on the host.
@@ -288,13 +288,13 @@ implementation is as follows:
288
288
(A backend may choose to operate incrementally without forming explicit **E- ** or
289
289
**Q-vectors **.)
290
290
291
- - :math: `\mathbf {G}` is represented as variable of type :ref: `CeedElemRestriction `.
291
+ - :math: `\bm {G}` is represented as variable of type :ref: `CeedElemRestriction `.
292
292
293
- - :math: `\mathbf {B}` is represented as variable of type :ref: `CeedBasis `.
293
+ - :math: `\bm {B}` is represented as variable of type :ref: `CeedBasis `.
294
294
295
- - the action of :math: `\mathbf {D}` is represented as variable of type :ref: `CeedQFunction `.
295
+ - the action of :math: `\bm {D}` is represented as variable of type :ref: `CeedQFunction `.
296
296
297
- - the overall operator :math: `\mathbf {G}^T \mathbf {B}^T \mathbf {D} \mathbf {B} \mathbf {G}`
297
+ - the overall operator :math: `\bm {G}^T \bm {B}^T \bm {D} \bm {B} \bm {G}`
298
298
is represented as variable of type
299
299
:ref: `CeedOperator ` and its action is accessible through ``CeedOperatorApply() ``.
300
300
@@ -320,9 +320,9 @@ may suffer in case of oversubscription). The resource is used to locate a
320
320
suitable backend which will have discretion over the implementations of all
321
321
objects created with this logical device.
322
322
323
- The ``setup `` routine above computes and stores :math: `\mathbf {D}`, in this case a
323
+ The ``setup `` routine above computes and stores :math: `\bm {D}`, in this case a
324
324
scalar value in each quadrature point, while ``mass `` uses these saved values to perform
325
- the action of :math: `\mathbf {D}`. These functions are turned into the ``CeedQFunction ``
325
+ the action of :math: `\bm {D}`. These functions are turned into the ``CeedQFunction ``
326
326
variables ``qf_setup `` and ``qf_mass `` in the ``CeedQFunctionCreateInterior() `` calls:
327
327
328
328
.. literalinclude :: ../../../tests/t500-operator.c
@@ -376,7 +376,7 @@ field needs to reflect both the number of components and the geometric dimension
376
376
A 3-dimensional gradient on four components would therefore mean the field has a size of
377
377
12.
378
378
379
- The :math: `\mathbf {B}` operators for the mesh nodes, ``bx ``, and the unknown field,
379
+ The :math: `\bm {B}` operators for the mesh nodes, ``bx ``, and the unknown field,
380
380
``bu ``, are defined in the calls to the function ``CeedBasisCreateTensorH1Lagrange() ``.
381
381
In this example, both the mesh and the unknown field use :math: `H^1 ` Lagrange finite
382
382
elements of order 1 and 4 respectively (the ``P `` argument represents the number of 1D
@@ -394,7 +394,7 @@ dimension using ``CeedBasisCreateTensorH1()``. Elements that do not have tensor
394
394
product structure, such as symmetric elements on simplices, will be created
395
395
using different constructors.
396
396
397
- The :math: `\mathbf {G}` operators for the mesh nodes, ``Erestrictx ``, and the unknown field,
397
+ The :math: `\bm {G}` operators for the mesh nodes, ``Erestrictx ``, and the unknown field,
398
398
``Erestrictu ``, are specified in the ``CeedElemRestrictionCreate() ``. Both of these
399
399
specify directly the dof indices for each element in the ``indx `` and ``indu ``
400
400
arrays:
@@ -415,23 +415,23 @@ contexts that involve problem-sized data.
415
415
416
416
For discontinuous Galerkin and for applications such as Nek5000 that only
417
417
explicitly store **E-vectors ** (inter-element continuity has been subsumed by
418
- the parallel restriction :math: `\mathbf {P}`), the element restriction :math: `\mathbf {G}`
418
+ the parallel restriction :math: `\bm {P}`), the element restriction :math: `\bm {G}`
419
419
is the identity and ``CeedElemRestrictionCreateStrided() `` is used instead.
420
- We plan to support other structured representations of :math: `\mathbf {G}` which will
420
+ We plan to support other structured representations of :math: `\bm {G}` which will
421
421
be added according to demand. In the case of non-conforming mesh elements,
422
- :math: `\mathbf {G}` needs a more general representation that expresses values at slave
422
+ :math: `\bm {G}` needs a more general representation that expresses values at slave
423
423
nodes (which do not appear in **L-vectors **) as linear combinations of the degrees of
424
424
freedom at master nodes.
425
425
426
- These operations, :math: `\mathbf {P}`, :math: `\mathbf {B}`, and :math: `\mathbf {D}`,
426
+ These operations, :math: `\bm {P}`, :math: `\bm {B}`, and :math: `\bm {D}`,
427
427
are combined with a ``CeedOperator ``. As with QFunctions, operator fields are added
428
- separately with a matching field name, basis (:math: `\mathbf {B}`), element restriction
429
- (:math: `\mathbf {G}`), and **L-vector **. The flag
428
+ separately with a matching field name, basis (:math: `\bm {B}`), element restriction
429
+ (:math: `\bm {G}`), and **L-vector **. The flag
430
430
``CEED_VECTOR_ACTIVE `` indicates that the vector corresponding to that field will
431
431
be provided to the operator when ``CeedOperatorApply() `` is called. Otherwise the
432
432
input/output will be read from/written to the specified **L-vector **.
433
433
434
- With partial assembly, we first perform a setup stage where :math: `\mathbf {D}` is evaluated
434
+ With partial assembly, we first perform a setup stage where :math: `\bm {D}` is evaluated
435
435
and stored. This is accomplished by the operator ``op_setup `` and its application
436
436
to ``X ``, the nodes of the mesh (these are needed to compute Jacobians at
437
437
quadrature points). Note that the corresponding ``CeedOperatorApply() `` has no basis
0 commit comments