Terry Tao on some desirable properties of mathematical notation

9 Answers

$\begingroup$

Mathematical notation in a given mathematical field $X$ is basically a correspondence $$ \mathrm{Notation}: \{ \hbox{well-formed expressions}\} \to \{ \hbox{abstract objects in } X \}$$ between mathematical expressions (or statements) on the written page (or blackboard, electronic document, etc.) and the mathematical objects (or concepts and ideas) in the heads of ourselves, our collaborators, and our audience. A good notation should make this map $\mathrm{Notation}$ (and its inverse) as close to a (natural) isomorphism as possible. Thus, for instance, the following properties are desirable (though not mandatory):

(Unambiguity) Every well-formed expression in the notation should have a unique mathematical interpretation in $X$ . (Related to this, one should strive to minimize the possible confusion between an interpretation of an expression using the given notation $\mathrm{Notation}$ , and the interpretation using a popular competing notation $\widetilde{\mathrm{Notation}}$ .)
(Expressiveness) Conversely, every mathematical concept or object in $X$ should be describable in at least one way using the notation.
(Preservation of quality, I) Every "natural" concept in $X$ should be easily expressible using the notation.
(Preservation of quality, II) Every "unnatural" concept on $X$ should be difficult to express using the notation. [In particular, it is possible for a notational system to be too expressive to be suitable for a given application domain.] Contrapositively, expressions that look clean and natural in the notation system ought to correspond to natural objects or concepts in $X$ .
(Error correction/detection) Typos in a well-formed expression should create an expression that is easily corrected (or at least detected) to recover the original intended meaning (or a small perturbation thereof).
(Suggestiveness, I) Concepts that are "similar" in $X$ should have similar expressions in the notation, and conversely.
(Suggestiveness, II) The calculus of formal manipulation in $\mathrm{Notation}$ should resemble the calculus of formal manipulation in other notational systems $\widetilde{\mathrm{Notation}}$ that mathematicians in $X$ are already familiar with.
(Transformation) "Natural" transformation of mathematical concepts in $X$ (e.g., change of coordinates, or associativity of multiplication) should correspond to "natural" manipulation of their symbolic counterparts in the notation; similarly, application of standard results in $X$ should correspond to a clean and powerful calculus in the notational system. [In particularly good notation, the converse is also true: formal manipulation in the notation in a "natural" fashion can lead to discovering new ways to "naturally" transform the mathematical objects themselves.]
etc.

To evaluate these sorts of qualities, one has to look at the entire field $X$ as a whole; the quality of notation cannot be evaluated in a purely pointwise fashion by inspecting the notation $\mathrm{Notation}^{-1}(C)$ used for a single mathematical concept $C$ in $X$ . In particular, it is perfectly permissible to have many different notations $\mathrm{Notation}_1^{-1}(C), \mathrm{Notation}_2^{-1}(C), \dots$ for a single concept $C$ , each designed for use in a different field $X_1, X_2, \dots$ of mathematics. (In some cases, such as with the metrics of quality in desiderata 1 and 7, it is not even enough to look at the entire notational system $\mathrm{Notation}$ , but also its relationship with the other notational systems $\widetilde{\mathrm{Notation}}$ that are currently in popular use in the mathematical community, in order to assess the suitability of use of that notational system.)

Returning to the specific example of expressing the concept $C$ of a scalar quantity $c$ being equal to the inner product of two vectors $u, v$ in a standard vector space ${\bf R}^n$ , there are not just two notations commonly used to capture $C$ , but in fact over a dozen (including several mentioned in other answers):

Pedestrian notation : $c = \sum_{i=1}^n u_i v_i$ (or $c = u_1 v_1 + \dots + u_n v_n$ ).
Euclidean notation : $c = u \cdot v$ (or $c = \vec{u} \cdot \vec{v}$ or $c = \mathbf{u} \cdot \mathbf{v}$ ).
Hilbert space notation : $c = \langle u, v \rangle$ (or $c = (u,v)$ ).
Riemannian geometry notation : $c = \eta(u,v)$ , where $\eta$ is the Euclidean metric form (also $c = u \neg (\eta \cdot v)$ , or $c = \iota_u (\eta \cdot v)$ ; one can also use $\eta(-,v)$ in place of $\eta \cdot v$ ).
Musical notation : $c = u_\flat(v)$ .
Matrix notation : $c = u^T v$ (or $c = \mathrm{tr}(vu^T)$ or $c = u^* v$ or $c = u^\dagger v$ ).
Bra-ket notation : $c = \langle u| v\rangle$ .
Einstein notation, I (without matching superscript/subscript requirement): $c = u_i v_i$ (or $c=u^iv^i$ , if vector components are denoted using superscripts).
Einstein notation, II (with matching superscript/subscript requirement): $c = \eta_{ij} u^i v^j$ .
Einstein notation, III (with matching superscript/subscript requirement and also implicit raising and lowering operators): $c = u^i v_i$ (or $c = u_i v^i$ or $c = \eta_{ij} u^i v^j$ ).
Penrose abstract index notation : $c = u^\alpha v_\alpha$ (or $c = u_\alpha v^\alpha$ or $c = \eta_{\alpha \beta} u^\alpha v^\beta$ ). [In the absence of derivatives this is nearly identical to Einstein notation III, but distinctions between the two notational systems become more apparent in the presence of covariant derivatives ( $\nabla_\alpha$ in Penrose notation, or a combination of $\partial_i$ and Christoffel symbols in Einstein notation).]
Hodge notation : $c = \mathrm{det}(u \wedge *v)$ (or $u \wedge *v = c \omega$ , with $\omega$ the volume form). [Here we are implicitly interpreting $u,v$ as covectors rather than vectors.]
Geometric algebra notation : $c = \frac{1}{2} \{u,v\}$ , where $\{u,v\} := uv+vu$ is the anticommutator.
Clifford algebra notation : $uv + vu = 2c1$ .
Graphical notations such as Penrose graphical notation .
etc.

Each of these notations is tailored to a different mathematical domain of application. For instance:

Matrix notation would be suitable for situations in which many other matrix operations and expressions are in use (e.g., the rank one operators $vu^T$ ).
Riemannian or abstract index notation would be suitable in situations in which linear or nonlinear changes of variable are frequently made.
Hilbert space notation would be suitable if one intends to eventually generalize one's calculations to other Hilbert spaces, including infinite dimensional ones.
Euclidean notation would be suitable in contexts in which other Euclidean operations (e.g., cross product) are also in frequent use.
Einstein and Penrose abstract index notations are suitable in contexts in which higher rank tensors are heavily involved. Einstein I is more suited for Euclidean applications or other situations in which one does not need to make heavy use of covariant operations, otherwise Einstein III or Penrose is preferable (and the latter particularly desirable if covariant derivatives are involved). Einstein II is suitable for situations in which one wishes to make the dependence on the metric explicit.
Clifford algebra notation is suitable when working over fields of arbitrary characteristic, in particular if one wishes to allow characteristic 2.

And so on and so forth. There is no unique "best" choice of notation to use for this concept; it depends on the intended context and application domain. For instance, matrix notation would be unsuitable if one does not want the reader to accidentally confuse the scalar product $u^T v$ with the rank one operator $vu^T$ , Hilbert space notation would be unsuitable if one frequently wished to perform coordinatewise operations (e.g., Hadamard product) on the vectors and matrices/linear transformations used in the analysis, and so forth.

share | cite | improve this answer | |

edited 2 hours ago

answered yesterday

Terry Tao

73.4k 20 20 gold badges 287 287 silver badges 372 372 bronze badges

$\endgroup$

1

$\begingroup$ $\endgroup$ – user76284 yesterday
2

$\begingroup$ $\endgroup$ – Aaron Bergman 4 hours ago
1

$\begingroup$ Both variants of Einstein summation are in use, with the formulation you state preferred if one wants to take full advantage of covariance, but the more relaxed formulation suitable for Euclidean contexts in which one will not need to rely much on covariant transformations. See en.wikipedia.org/wiki/Einstein_notation#Application . (Personally, if one is going to make heavy use of covariant operations, and particularly covariant derivatives, I would recommend using Penrose abstract summation notation instead, unless one really likes Christoffel symbols for some reason.) $\endgroup$ – Terry Tao 3 hours ago
1

$\begingroup$ $\endgroup$ – Deane Yang 2 hours ago
1

$\begingroup$ $\endgroup$ – Deane Yang 2 hours ago

$\begingroup$

Inner product is defined axiomatically, as a function from $V\times V\to k$ , where $k$ is a field and $V$ is a $k$ -vector space, satisfying the three well-known axioms. The usual notation is $(x,y)$ . So when you want to say anything about an arbitrary inner product, you use this notation (or some similar one). $(x,y)=x^*y$ is just one example of an inner product on the space $\mathbb C^n$ . There are other examples on the same space, $(x,y)=x^*Ay$ where $A$ is an arbitrary Hermitian positive definite matrix, and there are dot products on other vector spaces.

share | cite | improve this answer | |

edited 4 hours ago

Chetan Vuppulury

317 1 1 silver badge 9 9 bronze badges

answered yesterday

Alexandre Eremenko

63.6k 6 6 gold badges 176 176 silver badges 300 300 bronze badges

$\endgroup$

1

$\begingroup$ $\endgroup$ – Nathaniel Johnston yesterday
$\begingroup$ $\endgroup$ – LSpice yesterday
$\begingroup$ $\endgroup$ – Alexandre Eremenko 23 hours ago
$\begingroup$ YES. This is absolutely the correct answer. $\endgroup$ – Nik Weaver 20 hours ago

$\begingroup$

One advantage of $\langle \cdot, \cdot \rangle$ is that you don't have to worry about changes in basis.

Suppose we have a coordinate system $\alpha$ in which our (real) inner product space is explicitly Euclidean, and an alternative coordinate system $\beta$ . A vector $v$ is expressed in the coordinates systems as, respectively, the column vectors $[v]_\alpha$ and $[v]_\beta$ . Let $P$ denote the change of basis matrix

$$ [v]_\beta = P [v]_\alpha $$

The inner product, which in coordinate system $\alpha$ is $\langle v, v\rangle = [v]_{\alpha}^T [v]_{\alpha}$ is certainly not in general $[v]_\beta^T[v]_\beta$ in the second coordinate system. (It is only so if $P$ is orthogonal.)

That said: given any Hilbert space $V$ , by Riesz-representation there exists an (anti-)isomorphism from $V$ to its dual space $V^*$ . You can certainly choose to call this mapping $v \mapsto v^*$ (in Riemannian geometry contexts this is more usually denoted using the musical isomorphism notation $\flat$ and $\sharp$ ) and I don't think in this case there are reasons to prefer one to another. But a major caveat if you do things this way is that unless you are working in an orthonormal basis, you cannot associate $v \mapsto v^*$ to the "conjugate transpose" operation on matrices.

share | cite | improve this answer | |

edited 8 hours ago

answered yesterday

mYzY32Y.jpg!web

Willie Wong

23.9k 4 4 gold badges 65 65 silver badges 130 130 bronze badges

$\endgroup$

$\begingroup$ $\endgroup$ – Qfwfq yesterday
$\begingroup$ $\endgroup$ – Willie Wong yesterday
$\begingroup$ Real Hilbert spaces, I guess? $\endgroup$ – LSpice yesterday
$\begingroup$ $\endgroup$ – Willie Wong 9 hours ago
1

$\begingroup$ @LSpice: bah, I forgot to include the "(anti-)". Fixing it now. $\endgroup$ – Willie Wong 8 hours ago

$\begingroup$

This is to expand on my comment in response to Federico Poloni:

$\langle u,v\rangle $ is explicitly a number, whereas $u^Tv$ is a 1 by 1 matrix :).

While it is true that there is a canonical isomorphism between the two, how do you write the expansion of $u$ in an orthonormal base $\{v_i\}$ ? Something like $$ u=\sum_i u^Tv_i v_i $$ feels uncomfortable as if you view everything as matrices, the dimensions do not allow for multiplication. So, I would at least feel a need to insert parentheses, $$ u=\sum_i (u^Tv_i) v_i, $$ to indicate that the canonical isomorphism is applied. But that is still vague-ish while already cancelling any typographical advantages of $u^Tv$ .

(I do also share the sentiment that the basis-dependent language is inferior and should be avoided when possible.)

share | cite | improve this answer | |

edited yesterday

J. W. Tanner

129 7 7 bronze badges

answered yesterday

Kostya_I

4,191 19 19 silver badges 28 28 bronze badges

$\endgroup$

6

$\begingroup$ $\endgroup$ – Federico Poloni yesterday
3

$\begingroup$ $\endgroup$ – Terry Tao yesterday
1

$\begingroup$ $\endgroup$ – Kostya_I 13 hours ago
1

$\begingroup$ $\endgroup$ – Federico Poloni 13 hours ago
1

$\begingroup$ $\endgroup$ – Federico Poloni 8 hours ago

$\begingroup$

One huge advantage, to my mind, of the bracket notation is that it admits 'blanks'. So one can specify the notation for an inner product as $\langle \ , \ \rangle$ , and given $\langle \ , \rangle : V \times V \rightarrow K$ , one can define elements of the dual space $V^\star$ by $\langle u , - \rangle$ and $\langle -, v \rangle$ . (In the complex case one of these is only conjugate linear.)

More subjective I know, but on notational grounds I far prefer to write $\langle Au, v \rangle = \langle u, A^\dagger v \rangle$ for the adjoint map than $(Au)^t v = u^t (A^tv)$ . The former also emphasises that the construction is basis independent. It generalises far better to Hilbert spaces and other spaces with a non-degenerate bilinear form (not necessarily an inner product).

I'll also note that physicists, and more recently anyone working in quantum computing, have taken the 'bra-ket' formulation to the extreme, and use it to present quite intricate eigenvector calculations in a succinct way. For example, here is the Hadamard transform in bra-ket notation:

$$ \frac{| 0 \rangle + |1 \rangle}{\sqrt{2}} \langle 0 | + \frac{| 0 \rangle - |1\rangle}{\sqrt{2}} \langle 1 |. $$

To get the general Hadamard transform on $n$ qubits, just taken the $n$ th tensor power: this is compatible with the various implicit identifications of vectors and elements of the dual space.

Finally, may I issue a plea for everyone to use $\langle u ,v \rangle$ , with the LaTeX \langle and \rangle rather than the barbaric $<u,v>$ .

share | cite | improve this answer | |

edited yesterday

answered yesterday

Mark Wildon

8,514 1 1 gold badge 33 33 silver badges 52 52 bronze badges

$\endgroup$

3

$\begingroup$ The physicists' bra-ket notation is very "type confusing". It's not just making use of a reasonable symbol of scalar product $\langle \;.\; |\; . \;\rangle$ to denote the "metric" dual $\langle v |\; .\; \rangle$ of a vector $v$, or to denote an operator of the form $v\otimes u^{\vee}$ as $v \langle u |$, which would be totally standard for mathematicians. No, they use $|\quad\rangle$ as a sort of blank in which to insert some symbol: substitute any symbol in place of the box in $|\square\rangle$ as in $v_{\square}$, (...) $\endgroup$ – Qfwfq yesterday
3

$\begingroup$ (...) no matter if such a symbol (inserted inside the ket) denotes a vector itself, as in $| v \rangle$, an eigen- value , as in $| \lambda_i \rangle$, or an index, as in $|\spadesuit\rangle$ or $| 0 \rangle$ or $| \uparrow \rangle$. $\endgroup$ – Qfwfq yesterday
2

$\begingroup$ $\endgroup$ – Aaron Bergman yesterday
1

$\begingroup$ $\endgroup$ – Aaron Bergman yesterday
4

$\begingroup$ I think it may be best to think of $|\ \rangle$ (resp. $\langle\ |$) as a type conversion (or casting) operator from almost any type to a vector type (resp. covector type). en.wikipedia.org/wiki/Type_conversion . Type conversion operators are commonplace in C-type programming languages and I think some form of them could be safely adopted in mathematical notation more often than is currently done in my opinion. $\endgroup$ – Terry Tao yesterday

$\begingroup$

The family $F$ of (real) quadratic polynomials is a vector space isomorphic to the vector space $\mathbb{R}^3.$ One way to make $F$ an inner product space is to define $\langle f, g \rangle =\int_a^bf(t)g(t)\,dt$ for some fixed interval $[a,b].$ Instead of quadratic polynomials one might consider all polynomials or all bounded integrable functions. One could also define the inner product as $\langle f, g \rangle =\int_a^bf(t)g(t)\mu(t)dt$ for some weight function $\mu.$ There isn’t a natural role for transposes here.

share | cite | improve this answer | |

answered yesterday

Aaron Meyerowitz

27.3k 1 1 gold badge 37 37 silver badges 92 92 bronze badges

$\endgroup$

1

$\begingroup$ $\endgroup$ – LSpice yesterday

$\begingroup$

I do not see a compelling argument for $\langle \cdot, \cdot \rangle$ over $(\cdot)^T(\cdot)$ , or, better $(\cdot)^*(\cdot)$ , so that the star operator can be generalized to other more complicated settings (complex vectors, Hilbert spaces with a dual operation).

Let me summarize the arguments in the comments:

emphasizes vectors as geometric objects: not clear why $u^*v$ is less geometric.
free space for a superscript: I agree, that is an argument in favor of $\langle \cdot, \cdot \rangle$ . In a setting where I need many superscripts, I would probably favor that notation.
emphasizes bilinearity: disagree. In the complex case, it makes a lot less clear why one of these two arguments is not like the other and implies a conjugation, and it does not make clear which one it is: is $\langle \lambda u,v \rangle$ equal to $\lambda\langle u,v \rangle$ or to $\overline{\lambda}\langle u,v \rangle$ ? Is there a way to recall it other than remembering it?
Leaves room for an operator and gives a clear interpretation of adjointness: I find $(Au)^*v=u^*A^*v = u(A^*v)$ equally clear, and it relies only on manipulations that are well ingrained in the mind of mathematicians.
Gives an interpretation for the linear functional $\langle u, \cdot \rangle$ : but what is $u^*$ or $u^T$ if not a representation for that same linear functional?

An advantage of the $u^*v$ notation, in my view, it that it makes clear that some properties are just a consequence of associativity. Consider for instance the orthogonal projection on the orthogonal space to $u$

$$Pv = (I-uu^*)v = v - u(u^*v).$$

If one writes it as $v - \langle v,u \rangle u$ (especially by putting the scalar on the left as is customary), it is less clear that it is equivalent to applying the linear operator $I-uu^*$ to the vector $v$ . Also, the notation generalizes nicely to repeated projections $$ (I-u_1u_1^* - u_2u_2^*)v = (I - \begin{bmatrix}u_1 & u_2\end{bmatrix}\begin{bmatrix}u_1^* \\ u_2^*\end{bmatrix})v = (I - UU^*)v. $$

A disadvantage, of course, is working with spaces of matrices, where transposes already have another meaning; for instance, working with the trace scalar product $\langle A,B \rangle := \operatorname{Tr}(A^TB)$ one really needs the $\langle A,B \rangle$ notation.

share | cite | improve this answer | |

answered yesterday

Federico Poloni

15.4k 2 2 gold badges 61 61 silver badges 98 98 bronze badges

$\endgroup$

3

$\begingroup$ $\endgroup$ – Francesco Polizzi yesterday
1

$\begingroup$ $\endgroup$ – LSpice yesterday

$\begingroup$

Maybe it's worth mentioning that the computer language APL has a "generalized" inner product where you can use any two functions of two arguments (i.e., "dyadic functions" in APL terms) to form an inner product. Thus, for example, ordinary inner product is written as "A+.xB", which can apply to two arrays A, B of any dimension whatsoever (vectors, matrices, three-dimensional arrays, etc.), provided that the last dimension of A matches the first dimension of B.

Thus, for example, A^.=B represents string matching of A against B, Ax.*B evaluates a number given its prime divisors A and prime factorization exponents B, etc.

The authors of APL, Iverson and Falkoff, cared intensely about notation and tried to find the most general interpretation of every new item they added to the language.

share | cite | improve this answer | |

answered 4 hours ago

eeU7Jjr.jpg!web

Jeffrey Shallit

2,679 17 17 silver badges 27 27 bronze badges

$\endgroup$

$\begingroup$

I consider the distinction quite important. There are two separate operations that look superficially like each other but are in fact different.

First, the abstract description. If $V$ is an abstract vector space and $V^*$ is its dual, then there is the natural evaluation operation of $v \in V$ and $\theta \in V^*$ , which is commonly written as $$ \langle\theta,v\rangle = \langle v,\theta\rangle $$ No inner product is needed here. If you choose a basis $(e_1, \dots, e_n)$ of $V$ and use the corresponding dual basis $(\eta^1, \dots, \eta^n)$ of $V^*$ and write $v = v^ie_i$ and $\theta = \theta_i\eta^i$ , then $$ \langle\theta,v\rangle = \theta_iv^i. $$ The distinction between up and down indices indicates whether the object is a vector or a dual vector ( $1$ -form).

If $V$ has an inner product and $(e_1, e_n)$ is an orthonormal basis, then given two vectors $v = v^ie_i, w = w^ie_i \in V$ , then $$ v\cdot w = v^iw^i $$ Notice that here both indices are up. There is a similar formula for the dot product of two dual vectors. Here, the formula only works if the basis is orthonormal.

How does this look in terms of row and column vectors? My personal convention, a common one, is the following:

When writing the components of a matrix as $A^i_j$ , I view the superscript as the row index and the subscript as the column index.
I view a vector $v \in V$ as a column vector, which is why its coefficients are superscripts (and the basis elements are labeled using subscripts).
This means that a dual vector $\theta$ is a row vector, which is why its coefficients are subscripts.
With these conventions $$ \langle \theta,v\rangle = \theta v, $$ where the right side is matrix multiplication. The catch here is that the dual vector has to be the left factor and the vector the right vector. To avoid this inconsistency, I always write either $\langle \theta,v\rangle$ or $\theta^iv_i = v_i\theta^i$ . Again, note that these formulas hold for any basis of $V$ .
If $V$ has an inner product and $v, w$ are written with respect to an orthonormal basis, then indeed $$ v\cdot w = v^Tw = v^iw^i $$ You can, in fact, lower (or raise) all of the indices and have an implicit sum for any pair of repeated indices. This is, in fact, what Chern would do.

ASIDE: I gotta say that having such precisely defined conventions is crucial to my ability to do nontrivial calculations with vectors and tensors. When I was a graduate student, my PhD advisor, Phillip Griffiths, once asked me, "Have you developed your own notation yet?" I also have to acknowledge that my notation is either exactly or based closely on Robert Bryant's notation.

share | cite | improve this answer | |

answered 2 hours ago

FBvqMfE.jpg!web

Deane Yang

22.6k 5 5 gold badges 70 70 silver badges 147 147 bronze badges

$\endgroup$

Not the answer you're looking for? Browse other questions taggedsoft-question notation inner-product or ask your own question .

9 Answers

Recommend

敏捷测试宣言与原则 - ThoughtWorks洞见

Automating Machine Learning: Google AutoML-Zero Evolves ML Algorithms From Scrat...

深入理解CSS background-blend-mode的作用机制

到底什么是“云原生”？

有哪些大数据处理工具？

Abusing Azure AD SSO with the Primary Refresh Token

八皇后||算法

Flutter Interview Questions and Answers [FREE]

Bidi Brackets for Dummies

Xfce Review: A Lean, Mean Linux Machine

About Joyk