15

Valhalla -- finding the primitives

 4 years ago
source link: https://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2020-February/001232.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

finding the primitives

Valhalla -- finding the primitives

Brian Goetz brian.goetz at oracle.com
Tue Feb 18 19:01:50 UTC 2020

I think its worth reflecting on how far we've come in Valhalla, both for 
the specific designs in the VM and language, and the clarity of the 
basic concepts.

In the early model (Q World), the idea was that we would declare a class 
as either a value class or a "regular" class, and we would derive 
various properties based on that:

     regular classes have identity, value classes do not
     regular classes are nullable, value classes are not
     regular classes are reference types, value classes are not

This model was derived from the current relationship between `int` and 
`Integer`.  But, to interoperate with dynamically typed code (such as 
reflection) and erased generics, we needed reference types, so each 
value class got a "box" type (denotable as LV at the VM level) which was 
a reference type.  Any interfaces declared on the value class were 
superinterfaces of the box.  There was one runtime class, and 
`getClass()` returned that, regardless of whether invoked on a value or 
a boxed reference.

This worked, but there were many aspects which were either confusing or 
unsatisfying.  Value classes were neither reference types nor 
primitives, so we had gone from a type system split cleanly in two 
(which many people dislike) to one split uncleanly in three (for 
example, values had the Object methods, but didn't derive from Object, 
complicating code that was supposed to be generic across values and 
references alike.) Some chafed at the notion that value types could 
never be nullable; others didn't like that the all-zero value was always 
a member of the value set, whether or not it had semantic meaning.  Some 
reference types had significant identity, and others (boxes) didn't.  
And we didn't have a clean story for migration.

In the second iteration (L World), we addressed (at the VM level) the 
need to box in order to access reference-related functionality, by 
making `QV` a subtype of `LObject`, rationalizing the subtyping 
relationships between arrays, and replacing the box with a 
null-adjunction type (LV).  This reduced the pressure on migration 
substantially, but we still hadn't addressed most of the user model 
issues, including the tripartite nature of the type system, and we 
created quite a few problems as a result (such as the relationship 
between the two class mirrors for QV/LV.)

For example, to address initialization safety (where the zero value is 
outside the domain), we explored the notion of zero-default vs 
null-default inline classes, which involved treating the all-zero value 
as a null for some value classes but as a zero for others.  But we kept 
finding that we were having too many "flavors" of everything, because, 
in hindsight, the various aspects were not yet cleanly factored down to 
their primitives.  In the end, it turned out we were conflating a number 
of distinctions, and kept trying to use one as a proxy for another:

  - nullable vs non-nullable
  - pass-by-reference vs pass-by-value / flattened
  - reference type vs value type
  - identity-ful vs identity-free

For example, we wanted to call classes like String "reference types" and 
classes like Point "value types", but when we got to types like Object 
and interfaces, they had one foot in each camp.  It turns out, that in 
the "find the primitive" game, "reference type" wasn't the primitive.


Classes.  The user declares _classes_ ("public class Foo { }"); we 
derive _types_ from class declarations (Foo, Foo[], etc.) The primitive 
that Valhalla introduces into class declaration is whether the instances 
of the class _have identity or not_. Traditional classes are now 
revealed to be "identity classes"; the new kind (identity-free) are 
called "inline classes".  (This might not be the final word on the 
subject.)

Types and values.  In the type system we have now, some types contain 
primitive values, and other types contain _references to objects_.  What 
messed us up for a while is that the type types -- Object and interfaces 
-- can contain both.  A big AHA of the recent iterations is that it 
makes sense to talk about both _values of_ inline classes and 
_references to_ those values. Reference type has (almost) nothing to do 
with inline vs identity -- it has to do with whether the value set of 
the type contains values, or references.

For an identity class C, we derive one type: C, which consists of 
references to instances of C.  For an inline class V, we derive two 
types: `V.ref`, which is a reference type (and therefore nullable), and 
contains references to the instances of V, and `V.val`, which is not a 
reference type, and whose values are "raw" instances of V.

With this understanding, the nullity problem becomes a simpler one: 
nullity is a property of _reference types_.  So `V.ref` is nullable, and 
`V.val` is not; we don't need a way to say "nullable value" or different 
ways to interpret the default value.  We derive flattening and calling 
conventions in the same way; for reference types, we always store / pass 
as-if-by reference, but for "val" types, we store / pass as-if-by value.


It is this refined understanding that has brought me back to the ref/val 
notation _for the types_.  "Inline" is a way of saying "identity free" 
when declaring classes, but it doesn't say anything (yet) about the 
semantics of how we represent variables on the heap or pass them on the 
stack.  For this, we need an additional property of the type, and ref vs 
val seems to ideally describe what we mean -- that the value set of the 
type consists of either references or values, and the 
representation/calling conventions behave as if we are storing/passing 
references or values.  (Having come to this clarity about the types, we 
are free to pick a word other than "inline" if we think there is a 
better way to say "identity-free", though I don't think going back to 
"value" is necessarily right.)

With this distinction in place, some previously nasty problems (such as 
nullity) become trivial (if you want "nullable values", use references), 
and some previously impossible problems (such as unifying primitives 
with values) become tractible.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/valhalla-spec-experts/attachments/20200218/1da78c5d/attachment.htm>



More information about the valhalla-spec-experts mailing list


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK