8

Libres pensées d'un mathématicien ordinaire

 2 years ago
source link: https://djalil.chafai.net/blog/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Libres pensées d'un mathématicien ordinaire Posts

Nicolas Léonard Sadi Carnot (1796 – 1932), an Évariste Galois of Physics.

Relative entropy. Let λ be a reference measure on some measurable space E. The relative entropy with respect to λ is defined for every measure μ on E with density dμ/dλ by H(μ∣λ):=∫dμdλlogdμdλdλ. If the integral is not well defined, we could simply set H(μ∣λ):=+∞.

  • An important case is when λ is a probability measure. In this case H becomes the Kullback-Leibler divergence, and the Jensen inequality for the strictly convex function u↦ulog(u) indicates then that H(μ∣λ)≥0 with equality if and only if μ=λ.
  • Another important case is when λ is the Lebesgue measure on Rn or the counting measure on a discrete set, then −H(μ∣λ) is the Boltzmann-Shannon entropy of μ. Beware that when E=Rn, this entropy takes its values in the whole (−∞,+∞) since for all positive scale factor σ>0, denoting μσ the push forward of μ by the dilation x↦σx, we have H(μσ∣λ)=H(μ∣λ)−nlogσ.

Boltzmann-Gibbs probability measures. Such a probability measure μV,β takes the form dμV,β:=e−βVZV,βdλ where V:E↦(−∞,+∞], β∈[0,+∞), and ZV,β:=∫e−βVdλ<∞ is the normalizing factor. The more β is large, the more μV,β puts its probability mass on the regions where V is low. The corresponding asymptotic analysis, known as the Laplace method, states that as β→∞ the probability measure μV,β concentrates on the minimizers of V.

The mean of V or V-moment of μV,β writes
∫VdμV,β=−1βH(μV,β∣λ)−1βlogZV,β.
In thermodynamics −1βlogZV,β appears as a Helmholtz free energy since it is equal to ∫VdμV,β (mean energy) minus 1β×−H(μV,β∣λ) (temperature times entropy).

When β ranges from −∞ to ∞, the V-moment of μV,β ranges from supV downto infV, and ∂β∫VdμV,β=(∫VdμV,β)2−∫V2dμV,β≤0. If λ(E)<∞ then μV,0=1λ(E)λ and its V-moment is 1λ(E)∫Vdλ.

Variational principle. Let β≥0 such that ZV,β<∞ and c:=∫VdμV,β<∞. Then, among all the probability measures μ on E with same V-moment as μV,β, the relative entropy H(μ∣λ) is minimized by the Boltzmann-Gibbs measures μV,β. In other words,min∫Vdμ=cH(μ∣λ)=H(μV,β∣λ).

Indeed we have H(μ∣λ)−H(μV,β∣λ)=∫logdμdλdλ−∫logdμV,βdλdμV,β=∫logdμdλdλ+∫(log(ZV,β)+βV)dμV,β=∫logdμdλdλ+∫(log(ZV,β)+βV)dμ=∫logdμdλdλ−∫logdμV,βdλdμ=H(μ∣μV,β)≥0 with equality if and only if μ=μV,β. The crucial point is that μ and μV,β are equal on test functions of the form a+bV where a,b are arbitrary real constants, by assumption.

  • When λ is the Lebesgue measure on Rn or the counting measure on a discrete set, we recover the usual maximum Boltzmann-Shannon entropy principe max∫Vdμ=c−H(μ∣λ)=−H(μV,β).In particular, Gaussians maximize the Boltzmann-Shannon entropy under variance constraint (take for V a quadratic form), while the uniform measures maximize the Boltzmann-Shannon entropy under support constraint (take V constant on a set of finite measure for λ, and infinity elsewere). Maximum entropy is minimum relative entropy with respect to Lebesgue or counting measure, a way to find, among the probability measures with a moment constraint, the closest to the Lebesgue or counting measure.
  • When λ is a probability measure, then we recover the fact that the Boltzmann-Gibbs measures realize the projection or least Kullback-Leibler divergence of λ on the set of probability measures with a given V-moment. This is the Csiszár I-projection.
  • There are other interesiting applications, for instance when λ is a Poisson point process.

Note. The concept of maximum entropy was studied notably by

and by Edwin Thompson Jaynes (1922 – 1998) in relation with thermodynamics, statistical physics, statistical mechanics, information theory, and Bayesian statistics. The concept of I-projection or minimum relative entropy was studied notably by Imre Csiszár (1938 – ).

Related.

Leave a Comment

Ce petit billet d’information et d’aide à la décision, à destination des mathématiciens, a été préparé par et pour le réseau national des bibliothèques de mathématiques (RNBM).

  1. Pourquoi chercher à publier vertueusement puisqu’il y a Sci-Hub ? D’une part Sci-Hub est illégal, et d’autre part Sci-Hub s’appuie par construction sur les abonnements des établissements académiques à travers le monde. Sci-Hub libère la science d’hier et d’aujourd’hui par une mutalisation pirate, ce qui peut avoir un bon effet systémique à terme. En attendant, une bonne manière de libérer sa propre production scientifique immédiatement, durablement, et légalement est de la déposer dans des dépôts ad hoc comme arXiv, dont le mirroir français est intégré à HAL.
  2. Est-il suffisant de déposer systématiquement sur arXiv ? Le dépôt sur arXiv est toujours bienvenu pour la diffusion ouverte de la science. Mais comme rien ne garantit que la version finale qui a bénéficié du processus éditorial de la revue est sur arXiv, les lecteurs vont souvent préférer la version publiée par la revue lorsqu’elle est accessible. De ce point de vue, la situation idéale est celle des revues en libre accès qui déposent elles-mêmes sur arXiv, ou qui s’appuient sur arXiv comme les épirevues de www.episciences.org par exemple. D’autre part, un certain nombre de revues pratiquent le libre accès pour les auteurs et les lecteurs (libre accès « diamant ») sans passer par arXiv.
  3. Pourquoi ne pas tout faire sur ResearchGate ? ResearchGate est une plateforme semi-fermée de même nature que Facebook, qui n’est pas portée par des institutions académiques, et qui a vocation un jour à monétiser son accès, ses services, et sa base de données. Elle n’aide pas vraiment la science ouverte, bien au contraire.
  4. Quelles sont les revues les plus vertueuses ? Les revues en libre accès à la fois pour les auteurs et les lecteurs, soutenues par une institution académique, et adossées à arXiv, font en général partie des plus vertueuses, bien que certaines fassent appel, pour la gestion éditoriale et la mise aux normes, au bénévolat des chercheurs. À l’opposé, les revues à accès payant ne sont pas toutes à mettre dans le même sac : certaines pratiquent des prix raisonnables, qu’elles soient à but lucratif ou pas. De manière générale, le fonctionnement éditorial d’une revue a un coût, et les différences se font sur le modèle et la politique de financement, d’accès, et de diffusion. Concrètement, pour un chercheur junior qui souhaite publier un article, il est possible d’établir une liste de revues envisageables sur des critères scientifiques, puis de trier cette liste en tenant compte du modèle de chaque revue par rapport à la science ouverte. Un chercheur sénior peut se permettre de viser d’emblée les revues les plus vertueuses sur le plan de la science ouverte, au détriment de leur prestige scientifique, car cela est moins impactant sur son devenir. Et ils peuvent tous déposer leur version finale sur arXiv si la revue ne le fait pas.
  5. Pourquoi il n’est pas vertueux de payer l’éditeur pour libérer l’article à la publication ? C’est le système des APC, pour article processing charges. Comme tout a un coût, l’idée de faire payer l’auteur à l’éditeur pour diffuser librement son article peut séduire. Mais ce paiment de l’auteur ne sera accessible qu’aux auteurs riches ou membres d’institutions riches, et les articles publiés par les moins riches resterons moins diffusés et surtout accessibles uniquement sur abonnement, ce qui fait au bout du compte payer deux fois les institutions académiques. Le modèle du subscribe to open (S2O), qui se développe en ce moment, est de ce point de vue plus vertueux, car il ne fait payer qu’une seule fois les institutions pour la libération de tous les articles.

Note. Pour répondre à une question fréquemment posée, le RNBM, en tant qu’entité de l’Institut des sciences mathématiques du CNRS, ne peut pas faire ouvertement la publicité pour un service illégal comme Sci-Hub ou libgen. En revanche, étant donné l’usage massif de Sci-Hub|libgen à travers le monde et en particulier en France, il est normal que le RNBM en fasse état et en explique les mécanismes et les enjeux. Chaque mathématicien peut souhaiter avoir recours à un service comme Sci-Hub|libgen, parce que cela est efficace, parce que le savoir doit être diffusé, parce que ce type de subversion anarchiste pourrait forcer à terme les multinationales de l’édition mercantile à changer leur système.

Lectures connexes.

Leave a Comment
Statue de Gaspard Monge à Beaune
Statue of Gaspard Monge (1746 – 1818), place Monge, Beaune, Côte d’Or, France.

This post is about some aspects of transportation of measure. It is mostly inspired from the lecture notes of an advanced master course prepared few years ago in collaboration with my colleague Joseph Lehec in Université Paris-Dauphine – PSL. The objective is to reach the Caffarelli contraction theorem, one of my favourite theorems.

Pushforward or image measure. Let T:Rn→Rn and μ be a probability measure on Rn. The pushforward of μ by T is the measure ν given, for every Borel set A⊂Rn, by

ν(A)=μ(T−1(A)).

In other words T(X)∼ν when X∼μ, and thus for all test function h,

∫Rnhdν=∫Rnh∘Tdμ.

The Brenier theorem. It states that if μ and ν are two probability measures on Rn with μ absolutely continuous with respect to the Lebesgue measure then there exists a unique map T:Rn→Rn pushing forward μ to ν and T=∇ϕ with ϕ convex.

The uniqueness of the map T must be understood almost everywhere.

The convex function ϕ is obviously not unique but its gradient is unique.

When n=1 then T=F−1∘G where F=μ((−∞,∙]) and G=ν((−∞,∙]) are the cumulative distribution functions of μ and ν. The Brenier theorem states that in arbitrary dimension, it is still possible to pushforward using a multivariate analogue of the notion of non-decreasing function: the gradient of a convex function.

Relation to Wasserstein-Kantorovich coupling distance. If μ and ν have finite second moment and if T=∇ϕ is the Brenier map pushing forward μ to ν then

W2(μ,ν)2=minπ∈Π(μ,ν)∫|x−y|22π(dx,dy)=∫|x−T(x)|22dμ(dx).

In other words the optimal coupling is deterministic: π(dx,dy)=μ(dx)δT(x)(dy).
The transport map T=∇ϕ realizes an optimal transport of μ to ν.
A key here is the Kantorovich-Rubinstein dual formulation of W2:

W2(μ,ν)2=supf,g∫fdμ−∫gdν

where the infimum runs over the set of bounded and Lipschitz f,g:Rn→R such that f(x)≤g(y)+|x−y|22. We can also take the inf-convolution f(x)=infy∈Rn(g(x)+|x−y|22).

Reverse Brenier map, Legendre transform, convex duality. If ν is absolutely continuous with respect to the Lebesgue measure then ∇ϕ is invertible and (∇ϕ)−1=∇ϕ∗ is the Brenier map between ν and μ, where

ϕ∗(y)=supx{⟨x,y⟩–ϕ(x)}.

is the Legendre transform of ϕ (it is the also the gradient of a convex function).

Regularity of Brenier map. The Brenier map is not always continuous. For example if μ is uniform on [0,1] and ν is uniform on [0,1/2]∪[3/2,2] then the Brenier map must be the identity on [0,1/2[ and identity plus 1 on ]1/2,1].

A correct hypothesis for the regularity of the Brenier map is convexity of the support of the target measure. Indeed, Luis Caffarelli has proved that if μ and ν are absolutely continuous, and if their supports K and L are convex, and if their densities f,g are bounded away from 0 and +∞ on K and L respectively, then the Brenier map ∇ϕ is an homeomorphism between the interior of K and that of L. Moreover if f and g are continuous then ∇ϕ is a C1 diffeomorphism.

The regularity theory of transportation of measure is a delicate subject that was explored in the recent years by a bunch of mathematicians including Alessio Figalli.

Monge-Ampère equation. When ∇ϕ is a C1 diffeomorphism, the change of variable formula y=ϕ(x) gives, for all test function h, since Jac∇ϕ=∇2ϕ (Hessian),

∫Lh(y)g(y)dy=∫Kh(∇ϕ(x))g(∇ϕ(x))det(∇2ϕ(x))dx.

On the other hand, by definition of the Brenier map

∫Lh(y)g(y)dy=∫Rnhdν=∫Rnh∘∇ϕdμ=∫Kh(∇ϕ(x))f(x)dx.

Since this is valid for every test function h we obtain the following equality

g(∇ϕ(x))det(∇2ϕ(x))=f(x),     (1)

for every x in the interior of K. This is called Monge-Ampère equation. This is an important basic nonlinear equation in mathematics and physics.

From Monge-Ampère to Poisson-Langevin. When ϕ(x)=12|x|2 the Monge-Ampère simply reads g(x)=f(x). Let us consider a perturbation or linearization around this case by taking ϕ(x)=12|x|2+εψ(x)+O(ε2) and g(x)=(1+εh(x)+O(ε2))f(x), then, as ε→0, we find the Poisson equation for the Langevin operator:

(−Δ−∇ff⋅∇)ψ=h.

In other words, this reads −(Δ−∇V⋅∇)ψ=h if we write f=e−V. In the same spirit, the Wasserstein-Kantorovich distance can be interpreted as an inverse Sobolev norm.

The Caffarelli contraction theorem. If μ=e−Vdx and ν=e−Wdx are two probability measures on Rn such that α2|⋅|2−V and W−β2|⋅|2 are convex for some constants α,β>0, then the Brenier map T=∇ϕ pushing forward μ to ν satisfies ‖T‖Lip≤√α/β.

By taking V=α2|⋅|2 we obtain that a probability measure which is log-concave with respect to a non trivial Gaussian is a Lipschitz deformation of this Gaussian!

Idea of proof. We begin with n=1. Taking the logarithm in the Monge-Ampère equation gives 12log(φ”2)=log|φ”|=−V+W(φ′), and taking the derivative twice gives

φ””φ”−φ”′2φ”2=−V”+W”(φ′)φ”2+W′(φ′)φ”′.

Now if φ” has a maximum at x=x∗ then φ”′(x∗)=0 and φ””(x∗)≤0, and thus

0≥−V”(x∗)+W”(φ′(x∗))φ”2(x∗)henceφ”2(x∗)≤α/β.

This maximum principle argument is attractive but a maximum at the boundary may produce difficulties. Let us follow now the same idea in the case n≥1. Observe first that the Lipschitz constant of ∇ϕ is the supremum of the operator norm of ∇2ϕ. So it is enough to prove ‖∇2ϕ(x)‖op≤√α/β for every x. Besides since ϕ is convex ∇2ϕ is a positive matrix so this amounts to proving that ⟨∇2ϕ(x)u,u⟩≤√α/β for every unit vector u and every x∈Rn. Now we fix a direction u and we assume that the map

ℓ:x↦⟨∇2ϕ(x)u,u⟩

attains its maximum for x=x∗. The logarithm of the Monge-Ampère equation gives

logdet(∇2ϕ(x))=–V(x)+W(∇ϕ(x)).

Now we differentiate this equation twice in the direction u. To differentiate the left hand side, observe that if A is an invertible matrix

logdet(A+H)=logdet(A)+tr(A−1H)+o(H)(A+H)−1=A−1–A−1HA−1+o(H).

We obtain (omitting variables) −tr((∇2ϕ)−1(∂u∇2ϕ)(∇2ϕ)−1(∂u∇2ϕ))+tr((∇2ϕ)−1∂uu∇2ϕ)=–∂uuV+∑i∂iW∂iuuϕ+∑ij∂ijW(∂iuϕ)(∂juϕ). We shall use this equation at x∗. We claim that

tr((∇2ϕ)−1(∂u∇2ϕ)(∇2ϕ)−1(∂u∇2ϕ))≥0.

Indeed, ∇2ϕ≥0 so (∇2ϕ)−1≥0 and since ∂u∇2ϕ is symmetric, we get

(∂u∇2ϕ)(∇2ϕ)−1(∂u∇2ϕ)≥0.

Now it remains to recall that the product of two positive matrices has positive trace, namely if A and B are n×n real symmetric positive semidefinite then

Tr(AB)=Tr(√A√B(√A√B)⊤)≥0.

Since function ℓ attains its maximum at x∗ we have ∇2ℓ(x∗)≤0. Therefore

tr((∇2ϕ)−1∂uu∇2ϕ)=tr((∇2ϕ)−1∇2ℓ)≤0.

In the same way

∑i∂iW∂iuuϕ=⟨∇W,∇ℓ⟩=0.

So at point x∗ the main identity above gives

∑ij∂ijW(∂iuϕ)(∂juϕ)≤∂uuV.

Now the hypothesis made on V and W give ∂uuV≤α and

∑ij∂ijW(∂iuϕ)(∂juϕ)≥β∑i(∂iuϕ)2=β|∇2ϕ(u)|2.

Since u has norm 1, we get

ℓ(x∗)=⟨∇2ϕ(x∗)u,u⟩≤|∇2ϕ(x∗)(u)|≤√αβ.

Therefore ℓ(x)≤√α/β for every x which is the desired inequality.

Application to functional inequalities. The Poincaré inequality for the standard Gaussian measure γn=N(0,In)=(2π)−n2e−|x|22dx on Rn states that for an arbitrary say C1 and compactly supported test function f:Rn→R,

∫f2dγn−(∫fdγn)2≤∫|∇f|2dγn.

Let μ be a probability measure on Rn, image of γn by a C1 map T:Rn→Rn. The Poincaré inequality above with f=g∘T for an arbitrary g:Rn→R gives

∫g2dμ−(∫gdμ)2≤‖T‖2Lip∫|∇g|2dμ.

This is a Poincaré inequality for μ, provided that T is Lipschitz.

The Caffarelli contraction theorem states that if μ=e−Vdx with V−ρ2|⋅|2 convex for some constant ρ>0 then the map T pushing forward γn to μ satisfies ‖T‖2Lip≤1/ρ, which implies by the argument above that μ satisfies a Poincaré inequality of constant 1/ρ. The same argument works for other Sobolev type functional inequalities satisfied by the Gaussian measure, such as the logarithmic Sobolev inequality and the Bobkov isoperimetric functional inequalities. This transportation argument is a striking alternative to the Bakry-Émery curvature criterion in order to establish functional inequalities, but it does not prove the Gaussian case and does not have the extensibility of the latter to manifolds and abstract Markovian settings.

From Monge-Ampère to Gaussian log-Sobolev. Let us give a proof of the optimal logarithmic Sobolev inequality for the standard Gaussian measure γn by using directly the Monge-Ampère equation. Let f:Rn→R+ be such that ∫fdγn=1. Let T=∇ϕ be the Brenier map pushing forward fdγn to γn. We set θ(x):=ϕ(x)−12|x|2 so that ∇ϕ(x)=x+∇θ(x). We have Hess(θ)(x)+In≥0, and Monge-Ampère gives

f(x)e−|x|22=det(In+Hess(θ)(x))e−|x+∇θ(x)|22.

Taking the logarithm gives

logf(x)=−|x+∇θ(x)|22+|x|22+logdet(In+Hess(θ)(x))=−x⋅∇θ(x)−|∇θ(x)|22+logdet(In+Hess(θ)(x))≤−x⋅∇θ(x)−|∇θ(x)|22+Δθ(x),

where we have used log(1+t)≤t for 1+t>0 and the eigenvalues of the positive symmetric matrix In+Hess(θ)(x). Now integration with respect to fdγn gives

∫flogfdγn≤∫f(Δθ−x⋅∇θ)dγn−∫|∇θ|22fdγn.

Finally, using integration by parts (Δθ−x⋅∇θ is O.-U.!), we get

∫flogfdγn≤−∫12|√f∇θ+∇f√f|2dγn+12∫|∇f|2fdγn≤12∫|∇f|2fdγn.

Recall that T=∇ϕ=x+∇θ pushes forward ν to γn, where dν=fdγn. Therefore

∫|∇θ|22fdγn=∫|x−T(x)|22dν=W22(ν,γn).

Beyond the log-Sobolev inequality for the Gaussian measure, it is possible to obtain by this way, from the Monge-Ampère equation, HWI (H,W,I for entropy, Wasserstein, and Fisher information) functional inequalities for strongly log-concave measures. From this point of view, optimal transportation provides a partial alternative to the Bakry-Émery criterion on Rn.

Further reading

Leave a Comment
ar5iv-300x127.png

Connaissez-vous ar5iv ? En remplaçant le X de arXiv par 5 dans l’adresse d’un article sur arXiv, vous obtiendrez une version HTML5 de l’article ! Essayez par exemple avec le fameux

https://arxiv.org/abs/math/0211159

qui donne

https://ar5iv.org/abs/math/0211159

La conversion est faite avec LaTeXML, un moteur écrit en Perl qui transforme du LaTeX en XML et ses sous-produits comme HTML5. Cela fournit une alternative à la sortie DVI/PS/PDF habituelle. Cela ouvre des perspectives considérables, car les moteurs de recherche à terme vont pouvoir plonger dans les articles plus facilement.

Leave a Comment

Posts navigation


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK