Skip to content

Commit 6eb2416

Browse files
committed
Refactor \mathbf -> \bm
1 parent 484d679 commit 6eb2416

File tree

6 files changed

+108
-108
lines changed

6 files changed

+108
-108
lines changed

doc/sphinx/source/FEMtheory.rst.inc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@ We start by considering the discrete residual :math:`F(u)=0` formulation
1818
in weak form. We first define the :math:`L^2` inner product
1919

2020
.. math::
21-
\langle u, v \rangle = \int_\Omega u v d \mathbf{x},
21+
\langle u, v \rangle = \int_\Omega u v d \bm{x},
2222

23-
where :math:`d \mathbf{x} \in \mathbb{R}^d \supset \Omega`.
23+
where :math:`d \bm{x} \in \mathbb{R}^d \supset \Omega`.
2424

2525
We want to find :math:`u` in a suitable space :math:`V_D`,
2626
such that

doc/sphinx/source/libCEEDapi.rst

Lines changed: 50 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -35,13 +35,13 @@ mesh elements, and the values at quadrature points, respectively.
3535

3636
We refer to the operators that connect the different types of vectors as:
3737

38-
- Subdomain restriction :math:`\mathbf{P}`
39-
- Element restriction :math:`\mathbf{G}`
40-
- Basis (Dofs-to-Qpts) evaluator :math:`\mathbf{B}`
41-
- Operator at quadrature points :math:`\mathbf{D}`
38+
- Subdomain restriction :math:`\bm{P}`
39+
- Element restriction :math:`\bm{G}`
40+
- Basis (Dofs-to-Qpts) evaluator :math:`\bm{B}`
41+
- Operator at quadrature points :math:`\bm{D}`
4242

4343
More generally, when the test and trial space differ, they get their own
44-
versions of :math:`\mathbf{P}`, :math:`\mathbf{G}` and :math:`\mathbf{B}`.
44+
versions of :math:`\bm{P}`, :math:`\bm{G}` and :math:`\bm{B}`.
4545

4646
.. _fig-operator-decomp:
4747

@@ -50,11 +50,11 @@ versions of :math:`\mathbf{P}`, :math:`\mathbf{G}` and :math:`\mathbf{B}`.
5050
Operator Decomposition
5151

5252
Note that in the case of adaptive mesh refinement (AMR), the restrictions
53-
:math:`\mathbf{P}` and :math:`\mathbf{G}` will involve not just extracting sub-vectors,
53+
:math:`\bm{P}` and :math:`\bm{G}` will involve not just extracting sub-vectors,
5454
but evaluating values at constrained degrees of freedom through the AMR interpolation.
55-
There can also be several levels of subdomains (:math:`\mathbf{P1}`, :math:`\mathbf{P2}`,
56-
etc.), and it may be convenient to split :math:`\mathbf{D}` as the product of several
57-
operators (:math:`\mathbf{D1}`, :math:`\mathbf{D2}`, etc.).
55+
There can also be several levels of subdomains (:math:`\bm{P1}`, :math:`\bm{P2}`,
56+
etc.), and it may be convenient to split :math:`\bm{D}` as the product of several
57+
operators (:math:`\bm{D1}`, :math:`\bm{D2}`, etc.).
5858

5959

6060
Terminology and Notation
@@ -149,10 +149,10 @@ Operator representation/storage/action categories:
149149

150150
- CSR matrix on each rank
151151

152-
- the parallel prolongation operator, :math:`\mathbf{P}`, (and its transpose) should use
152+
- the parallel prolongation operator, :math:`\bm{P}`, (and its transpose) should use
153153
optimized matrix-free action
154154

155-
- note that :math:`\mathbf{P}` is the operator mapping T-vectors to L-vectors.
155+
- note that :math:`\bm{P}` is the operator mapping T-vectors to L-vectors.
156156

157157
- Element matrix assembly, **EA**:
158158

@@ -182,62 +182,62 @@ Operator representation/storage/action categories:
182182
Partial Assembly
183183
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
184184

185-
Since the global operator :math:`\mathbf{A}` is just a series of variational restrictions
186-
with :math:`\mathbf{B}`, :math:`\mathbf{G}` and :math:`\mathbf{P}`, starting from its
187-
point-wise kernel :math:`\mathbf{D}`, a "matvec" with :math:`\mathbf{A}` can be
185+
Since the global operator :math:`\bm{A}` is just a series of variational restrictions
186+
with :math:`\bm{B}`, :math:`\bm{G}` and :math:`\bm{P}`, starting from its
187+
point-wise kernel :math:`\bm{D}`, a "matvec" with :math:`\bm{A}` can be
188188
performed by evaluating and storing some of the innermost variational restriction
189189
matrices, and applying the rest of the operators "on-the-fly". For example, one can
190190
compute and store a global matrix on **T-vector** level. Alternatively, one can compute
191191
and store only the subdomain (**L-vector**) or element (**E-vector**) matrices and
192-
perform the action of :math:`\mathbf{A}` using matvecs with :math:`\mathbf{P}` or
193-
:math:`\mathbf{P}` and :math:`\mathbf{G}`. While these options are natural for
192+
perform the action of :math:`\bm{A}` using matvecs with :math:`\bm{P}` or
193+
:math:`\bm{P}` and :math:`\bm{G}`. While these options are natural for
194194
low-order discretizations, they are not a good fit for high-order methods due to
195195
the amount of FLOPs needed for their evaluation, as well as the memory transfer
196196
needed for a matvec.
197197

198198
Our focus in libCEED, instead, is on **partial assembly**, where we compute and
199-
store only :math:`\mathbf{D}` (or portions of it) and evaluate the actions of
200-
:math:`\mathbf{P}`, :math:`\mathbf{G}` and :math:`\mathbf{B}` on-the-fly.
199+
store only :math:`\bm{D}` (or portions of it) and evaluate the actions of
200+
:math:`\bm{P}`, :math:`\bm{G}` and :math:`\bm{B}` on-the-fly.
201201
Critically for performance, we take advantage of the tensor-product structure of the
202202
degrees of freedom and quadrature points on *quad* and *hex* elements to perform the
203-
action of :math:`\mathbf{B}` without storing it as a matrix.
203+
action of :math:`\bm{B}` without storing it as a matrix.
204204

205205
Implemented properly, the partial assembly algorithm requires optimal amount of
206206
memory transfers (with respect to the polynomial order) and near-optimal FLOPs
207207
for operator evaluation. It consists of an operator *setup* phase, that
208-
evaluates and stores :math:`\mathbf{D}` and an operator *apply* (evaluation) phase that
209-
computes the action of :math:`\mathbf{A}` on an input vector. When desired, the setup
208+
evaluates and stores :math:`\bm{D}` and an operator *apply* (evaluation) phase that
209+
computes the action of :math:`\bm{A}` on an input vector. When desired, the setup
210210
phase may be done as a side-effect of evaluating a different operator, such as a
211211
nonlinear residual. The relative costs of the setup and apply phases are
212212
different depending on the physics being expressed and the representation of
213-
:math:`\mathbf{D}`.
213+
:math:`\bm{D}`.
214214

215215

216216
Parallel Decomposition
217217
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
218218

219219
After the application of each of the first three transition operators,
220-
:math:`\mathbf{P}`, :math:`\mathbf{G}` and :math:`\mathbf{B}`, the operator evaluation
221-
is decoupled on their ranges, so :math:`\mathbf{P}`, :math:`\mathbf{G}` and
222-
:math:`\mathbf{B}` allow us to "zoom-in" to subdomain, element and quadrature point
220+
:math:`\bm{P}`, :math:`\bm{G}` and :math:`\bm{B}`, the operator evaluation
221+
is decoupled on their ranges, so :math:`\bm{P}`, :math:`\bm{G}` and
222+
:math:`\bm{B}` allow us to "zoom-in" to subdomain, element and quadrature point
223223
level, ignoring the coupling at higher levels.
224224

225-
Thus, a natural mapping of :math:`\mathbf{A}` on a parallel computer is to split the
225+
Thus, a natural mapping of :math:`\bm{A}` on a parallel computer is to split the
226226
**T-vector** over MPI ranks (a non-overlapping decomposition, as is typically
227227
used for sparse matrices), and then split the rest of the vector types over
228228
computational devices (CPUs, GPUs, etc.) as indicated by the shaded regions in
229229
the diagram above.
230230

231231
One of the advantages of the decomposition perspective in these settings is that
232-
the operators :math:`\mathbf{P}`, :math:`\mathbf{G}`, :math:`\mathbf{B}` and
233-
:math:`\mathbf{D}` clearly separate the MPI parallelism
234-
in the operator (:math:`\mathbf{P}`) from the unstructured mesh topology
235-
(:math:`\mathbf{G}`), the choice of the finite element space/basis (:math:`\mathbf{B}`)
236-
and the geometry and point-wise physics :math:`\mathbf{D}`. These components also
232+
the operators :math:`\bm{P}`, :math:`\bm{G}`, :math:`\bm{B}` and
233+
:math:`\bm{D}` clearly separate the MPI parallelism
234+
in the operator (:math:`\bm{P}`) from the unstructured mesh topology
235+
(:math:`\bm{G}`), the choice of the finite element space/basis (:math:`\bm{B}`)
236+
and the geometry and point-wise physics :math:`\bm{D}`. These components also
237237
naturally fall in different classes of numerical algorithms -- parallel (multi-device)
238-
linear algebra for :math:`\mathbf{P}`, sparse (on-device) linear algebra for
239-
:math:`\mathbf{G}`, dense/structured linear algebra (tensor contractions) for
240-
:math:`\mathbf{B}` and parallel point-wise evaluations for :math:`\mathbf{D}`.
238+
linear algebra for :math:`\bm{P}`, sparse (on-device) linear algebra for
239+
:math:`\bm{G}`, dense/structured linear algebra (tensor contractions) for
240+
:math:`\bm{B}` and parallel point-wise evaluations for :math:`\bm{D}`.
241241

242242
Currently in libCEED, it is assumed that the host application manages the global
243243
**T-vectors** and the required communications among devices (which are generally
@@ -252,7 +252,7 @@ ranks (each using a single ``Ceed`` object): 2 ranks using 1 CPU socket each, an
252252
4 using 1 GPU each. Another choice could be to run 1 MPI rank on the whole node
253253
and use 5 ``Ceed`` objects: 1 managing all CPU cores on the 2 sockets and 4
254254
managing 1 GPU each. The communications among the devices, e.g. required for
255-
applying the action of :math:`\mathbf{P}`, are currently out of scope of libCEED. The
255+
applying the action of :math:`\bm{P}`, are currently out of scope of libCEED. The
256256
interface is non-blocking for all operations involving more than O(1) data,
257257
allowing operations performed on a coprocessor or worker threads to overlap with
258258
operations on the host.
@@ -288,13 +288,13 @@ implementation is as follows:
288288
(A backend may choose to operate incrementally without forming explicit **E-** or
289289
**Q-vectors**.)
290290

291-
- :math:`\mathbf{G}` is represented as variable of type :ref:`CeedElemRestriction`.
291+
- :math:`\bm{G}` is represented as variable of type :ref:`CeedElemRestriction`.
292292

293-
- :math:`\mathbf{B}` is represented as variable of type :ref:`CeedBasis`.
293+
- :math:`\bm{B}` is represented as variable of type :ref:`CeedBasis`.
294294

295-
- the action of :math:`\mathbf{D}` is represented as variable of type :ref:`CeedQFunction`.
295+
- the action of :math:`\bm{D}` is represented as variable of type :ref:`CeedQFunction`.
296296

297-
- the overall operator :math:`\mathbf{G}^T \mathbf{B}^T \mathbf{D} \mathbf{B} \mathbf{G}`
297+
- the overall operator :math:`\bm{G}^T \bm{B}^T \bm{D} \bm{B} \bm{G}`
298298
is represented as variable of type
299299
:ref:`CeedOperator` and its action is accessible through ``CeedOperatorApply()``.
300300

@@ -320,9 +320,9 @@ may suffer in case of oversubscription). The resource is used to locate a
320320
suitable backend which will have discretion over the implementations of all
321321
objects created with this logical device.
322322

323-
The ``setup`` routine above computes and stores :math:`\mathbf{D}`, in this case a
323+
The ``setup`` routine above computes and stores :math:`\bm{D}`, in this case a
324324
scalar value in each quadrature point, while ``mass`` uses these saved values to perform
325-
the action of :math:`\mathbf{D}`. These functions are turned into the ``CeedQFunction``
325+
the action of :math:`\bm{D}`. These functions are turned into the ``CeedQFunction``
326326
variables ``qf_setup`` and ``qf_mass`` in the ``CeedQFunctionCreateInterior()`` calls:
327327

328328
.. literalinclude:: ../../../tests/t500-operator.c
@@ -376,7 +376,7 @@ field needs to reflect both the number of components and the geometric dimension
376376
A 3-dimensional gradient on four components would therefore mean the field has a size of
377377
12.
378378

379-
The :math:`\mathbf{B}` operators for the mesh nodes, ``bx``, and the unknown field,
379+
The :math:`\bm{B}` operators for the mesh nodes, ``bx``, and the unknown field,
380380
``bu``, are defined in the calls to the function ``CeedBasisCreateTensorH1Lagrange()``.
381381
In this example, both the mesh and the unknown field use :math:`H^1` Lagrange finite
382382
elements of order 1 and 4 respectively (the ``P`` argument represents the number of 1D
@@ -394,7 +394,7 @@ dimension using ``CeedBasisCreateTensorH1()``. Elements that do not have tensor
394394
product structure, such as symmetric elements on simplices, will be created
395395
using different constructors.
396396

397-
The :math:`\mathbf{G}` operators for the mesh nodes, ``Erestrictx``, and the unknown field,
397+
The :math:`\bm{G}` operators for the mesh nodes, ``Erestrictx``, and the unknown field,
398398
``Erestrictu``, are specified in the ``CeedElemRestrictionCreate()``. Both of these
399399
specify directly the dof indices for each element in the ``indx`` and ``indu``
400400
arrays:
@@ -415,23 +415,23 @@ contexts that involve problem-sized data.
415415

416416
For discontinuous Galerkin and for applications such as Nek5000 that only
417417
explicitly store **E-vectors** (inter-element continuity has been subsumed by
418-
the parallel restriction :math:`\mathbf{P}`), the element restriction :math:`\mathbf{G}`
418+
the parallel restriction :math:`\bm{P}`), the element restriction :math:`\bm{G}`
419419
is the identity and ``CeedElemRestrictionCreateStrided()`` is used instead.
420-
We plan to support other structured representations of :math:`\mathbf{G}` which will
420+
We plan to support other structured representations of :math:`\bm{G}` which will
421421
be added according to demand. In the case of non-conforming mesh elements,
422-
:math:`\mathbf{G}` needs a more general representation that expresses values at slave
422+
:math:`\bm{G}` needs a more general representation that expresses values at slave
423423
nodes (which do not appear in **L-vectors**) as linear combinations of the degrees of
424424
freedom at master nodes.
425425

426-
These operations, :math:`\mathbf{P}`, :math:`\mathbf{B}`, and :math:`\mathbf{D}`,
426+
These operations, :math:`\bm{P}`, :math:`\bm{B}`, and :math:`\bm{D}`,
427427
are combined with a ``CeedOperator``. As with QFunctions, operator fields are added
428-
separately with a matching field name, basis (:math:`\mathbf{B}`), element restriction
429-
(:math:`\mathbf{G}`), and **L-vector**. The flag
428+
separately with a matching field name, basis (:math:`\bm{B}`), element restriction
429+
(:math:`\bm{G}`), and **L-vector**. The flag
430430
``CEED_VECTOR_ACTIVE`` indicates that the vector corresponding to that field will
431431
be provided to the operator when ``CeedOperatorApply()`` is called. Otherwise the
432432
input/output will be read from/written to the specified **L-vector**.
433433

434-
With partial assembly, we first perform a setup stage where :math:`\mathbf{D}` is evaluated
434+
With partial assembly, we first perform a setup stage where :math:`\bm{D}` is evaluated
435435
and stored. This is accomplished by the operator ``op_setup`` and its application
436436
to ``X``, the nodes of the mesh (these are needed to compute Jacobians at
437437
quadrature points). Note that the corresponding ``CeedOperatorApply()`` has no basis

examples/ceed/index.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,20 +18,20 @@ are supported from the same code.
1818

1919
This example shows how to compute line/surface/volume integrals of a 1D, 2D, or 3D
2020
domain :math:`\Omega` respectively, by applying the mass operator to a vector of
21-
:math:`\mathbf{1}`\s. It computes:
21+
:math:`\bm{1}`\s. It computes:
2222

2323
.. math::
24-
I = \int_{\Omega} \mathbf{1} \, dV .
24+
I = \int_{\Omega} \bm{1} \, dV .
2525
:label: eq-ex1-volume
2626
2727
Using the same notation as in :ref:`Theoretical Framework`, we write here the vector
28-
:math:`u(\mathbf{x})\equiv \mathbf{1}` in the Galerkin approximation,
28+
:math:`u(\bm{x})\equiv \bm{1}` in the Galerkin approximation,
2929
and find the volume of :math:`\Omega` as
3030

3131
.. math::
3232
:label: volume-sum
3333
34-
\sum_e \int_{\Omega_e} v(x) \cdot \mathbf{1} \, dV
34+
\sum_e \int_{\Omega_e} v(x) \cdot \bm{1} \, dV
3535
3636
with :math:`v(x) \in \mathcal{V}_p = \{ v \in H^{1}(\Omega_e) \,|\, v \in P_p(\bm{I}), e=1,\ldots,N_e \}`,
3737
the test functions.
@@ -49,7 +49,7 @@ Arbitrary mesh and solution orders in 1D, 2D and 3D are supported from the same
4949
Similarly to :ref:`Ex1-Volume`, it computes:
5050

5151
.. math::
52-
I = \int_{\partial \Omega} \mathbf{1} \, dS .
52+
I = \int_{\partial \Omega} \bm{1} \, dS .
5353
:label: eq-ex2-surface
5454
5555
but this time by applying the divergence theorem using a Laplacian.
@@ -58,7 +58,7 @@ In particular, we select :math:`u(\bm x) = x_0 + x_1 + x_2`, for which :math:`\n
5858
Given Laplace's equation,
5959

6060
.. math::
61-
-\nabla \cdot \nabla u = 0, \textrm{ for } \mathbf{x} \in \Omega
61+
-\nabla \cdot \nabla u = 0, \textrm{ for } \bm{x} \in \Omega
6262
6363
multiply by a test function :math:`v` and integrate by parts to obtain
6464

@@ -68,4 +68,4 @@ multiply by a test function :math:`v` and integrate by parts to obtain
6868
Since we have chosen :math:`u` such that :math:`\nabla u \cdot \hat{\bm n} = 1`, the boundary integrand is :math:`v 1 \equiv v`. Hence, similar to :math:numref:`volume-sum`, we can evaluate the surface integral by applying the volumetric Laplacian as follows
6969

7070
.. math::
71-
\int_\Omega \nabla v \cdot \nabla u \, dV \approx \sum_e \int_{\partial \Omega_e} v(x) \cdot \mathbf{1} \, dS .
71+
\int_\Omega \nabla v \cdot \nabla u \, dV \approx \sum_e \int_{\partial \Omega_e} v(x) \cdot \bm{1} \, dS .

0 commit comments

Comments
 (0)