Types will win in the end
source link: https://changelog.com/podcast/548
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Transcript
Changelog
Click here to listen along while you enjoy the transcript. š§
Alright, we are here with Jake Zimmerman. Whatās up, Jake?
Not much. How are you guys?
Doing good, doing good. Happy to have you here. This is a requested episodeā¦ Always happy when we get to do a show thatās ā
We know itās 100% on point, for at least one person in our listening audience. This one was requested by Max VelDink who says āType checking has been a white whale in Ruby for a long time, and very divisive. Thereās even a built-in attempt in Ruby 3 called RBS that hasnāt gained much traction.ā Sorbet, on the other hand, has been adopted by many organizations, including Stripe, Shopify, Instacart, his company, JustWorks. And he says, āI think itās telling that many large Ruby shops are switching to some sort of type safety on larger Ruby codebases. It would be cool to hear from Jake, who works on Sorbet, at Stripe, around the origin story, what problems it solves there, and how it was like trying to convince pretty curmudgeonly Ruby devs to add type checking into their codebases.ā So Max, thanks for writing that to us. I agreed. I thought that would be cool, so I reached out, and youāre here now, Jake.
Thatās true. Yeah, I think all of those things he commented on - itās kind of what Iāve lived and breathed for the past five years of working on Sorbet and type checking Ruby. Itās been a wild ride.
Yeah, that was the first thing that I noticed, was - I had heard of Sorbet, but it was somewhat recentlyā¦ And I went back to check a little bit on the history, and like you guys were doing presentations in 2018.
And I think it goes back to 2017. Is that right? Youāve been working on it for a long time.
The project itself started, yeah, in fall of 2017, at Stripe. And thatās kind of one of the things thatās kind of set this project apart from a lot of other kind of like larger attempts to type JavaScript or type Python. Weāve kind of just focused on just doing what we need to do, and not really going out and trying to sell other people on this vision of what typing in Ruby could be. Itās more just been kind of like āHereās what we have. If you want it, thatās great. If you donāt, you can still keep using Ruby.ā
What about inside Stripe, though? Is it more evangelical inside of Stripe in terms of likeā
Well, thatās the other weird thingā¦ Max in his comment had said āIām curious to hear how ā it must have been hard convincing these curmodgeonly Ruby developersā, but it was the complete opposite inside of Stripe. It was the sort of thing were for years prior to starting this type checker project people were like āI love working at Stripe, our product is great, but every day I come in here and I have to use Ruby, and our codebase is too big, I donāt understand how anything worksā¦ I really just wish there was a type checker.ā And so we didnāt have to convince most of the company; we just kind of had to build the product.
[08:02] Thatās interesting.
Do you think that TypeScript and that move paved the way, to some degree, that it can be done successfully?
Oh, yeah. Absolutely. I think a huge part of it was people would switch back and forth between writing Typescript in the frontend, or Flow in the frontend, and then Ruby in the backend, and to know what could have been the case, what they were missing out on, basicallyā¦ So they just asked for it, and they kept asking for it.
Yeah. The lack of types in Ruby is really keen to the prototyping, and I think Stripe is kind of bakedā¦ I mean, obviously, youāre innovating. Do you think that thatās maybe less needed now, once itās sort of ā you know where Stripe is going. Itās a big codebase, lots of Rubyā¦ Do you think thatās why types in that environment is welcomed, versus āHey, weāre proving new ground here. We need to be at compile time flexible. We need that flexibility.ā
Yeah, I think people will sometimes say that in the prototyping phase you care less about types, and then in the iteration long-term maybe you need more typesā¦ I mean, thereās a class of people that will break that mold and say āI actually prefer the type checking, even when Iām in my prototyping phase, just because if you do want to completely switch out one half of your system, you know, that youāve switched it out correctly, because the type checker will catch you.ā
But I think that the biggest motivating factor for us was just at the time we were getting up to the place where we had hundreds of developers, and even if we were building new code, it was hard to make sense of it all. We really just wanted jump to definition, to be able to follow paths of control flow through the codebase, and connect things together. So it was more about understanding the code, I would say.
Itās interesting hearing that, because I guess been around long enough, I remember when people would be so excited to be able to work in Ruby on their day jobā¦ Because it was just hobbies for so long, and it was slowly becoming adopted. Obviously, Rails really helped that adoption come in, when you could actually make money doing Ruby. But weāre so far past that point, plus weāre at a point where people switch jobs and orgs so much. Iāve talked to multiple people on the frontend side through JS Party, who have come to a Ruby shop like Shopify, or Stripe, from something else; maybe they grew up in JavaScript land, doing Node apps and stuff, and theyāre like āYeah, the jobās cool, but I have to use Ruby, and I donāt know Ruby, and itās weird. I donāt like it.ā And Iām like āThatās it? Thatās the drawback, is the programming language?ā And I understand it; itās just a weird place to be when itās like, thatās the part of the job theyāre not excited about. Because it used to be that that was so exciting for people, to be able to use the programming language they love, and make money.
Yeah. Itās probably just a relative popularity thing. I think that peopleās primary programming language tends to be just such a large fraction of how they think and how they approach problems, and if youāre used to something, you want to switch ā if that thing is so different from the thing that youāre currently using, itās kind of a culture shock a little bit.
So what is some of the origin story? How did you get to working on this? You mentioned people already wanted it, so apparently there was a desire inside of Stripe for something like this, but how come you ā and youāve been working on it for a long time before the show. You said you eat, drink and dream ā I donāt know if thatās what you said, but something along the lines of Sorbet is all you think about. So here you are, five years later, still just thinking about that all day. Why you, and tell the origin.
Yeah, yeah. So I mentioned it started in kind of the fall of 2017. It started with two people who had been working at Stripe for a number of years, and one person that we had hired from just finishing his PhD, working on the Scala compiler. So it was a very small team, a very experienced set of people. They spent about a year building it from scratch. So by the end of that year, theyād gotten it to the point where it was able to type-check most of the code at Stripe. It was still kind of ā maybe only 75% of the codebase was opted into the type checker, and the other 25% still hadnāt gotten around to enabling it. But I started at Stripe actually basically the same time that this project started. So I got to kind of follow the project from my team, just outside, looking in, for that whole yearā¦
[12:08] When I was in school, I was always just super-interested in types and programming languages. I didnāt really realize this until basically my last year of being in university. If I had realized it maybe a year or so sooner, itās possible I wouldnāt have even joined Stripe, and I would have tried to do some sort of research and maybe go into higher education. But it didnāt work out that way, and so I was just kind of like - I knew that I had this passion for types in programming languages, but I didnāt quite understand whether there was a way to go from just being excited about it to being able to actually do it professionally. But I knew for this whole first year that I was working full-time at Stripe that we did have this team. And so I eventually got to the point where I was just like āIām going to regret it for the rest of my life if I never even ask to join the team.ā So one day I just asked them, I said āHey, do you guys have an opening? Can I come help out?ā And it just so turns out that because this team had been staffed by these super-experienced people, that they actually really wanted somebody who had zero experience, so that the people on the team could have the chance to flex their mentorship muscles, and kind of learn what it takes to teach younger developers.
And so I was one year out of school, I was working with three really experienced people, who had basically this mandate, like āYour whole job now is to train this other person.ā So it was a great environment, and again - yeah, because I already knew that I was kind of interested in it, I just kind of dove right in. And thatās kind of been it. Weāve worked on a handful of different things over that five years, whether itās been making the type system better, whether itās making the experience of using the type system in your editor betterā¦ We even spent a couple years working on an ahead-of-time compiler, using Sorbet to actually compile Ruby code to native code. And now weāre kind of back focusing on how we can basically just improve the type system, improve the editor, improve the type-checking experience.
Was that ahead of time compiler? Was that work - it ended up not being super-fruitful, so you went back to it? Or what was the story when you went down that path?
Yeah, the compiler project - it was kind of interesting. It was at a point in time when latency was the primary concern for pretty much every team at Stripe. This was during the height of the pandemic, when suddenly everyone across the internet who was running a software company was seeing increased volumes, and increased loads on the systemā¦ So we had basically just every team working on different ways to achieve latency, and we were just going to take whichever long-term bet panned out the quickest. So some of those people working on latency were just profiling Ruby code, seeing where they could get latency wins, some people were focusing on making the database faster, some people were taking really longer term sorts of changes, like āShould we rewrite the core architecture to use a different language?ā All sorts of different bets across the company. So one of these was the ahead of time compiler for Ruby. And we actually got to the point where it was completely working in production, and it really was just a matter of whether we wanted to continue working on it. And because of all the great work of other teams at Stripe making the Stripe API faster, we got to the point where we didnāt quite need the latency from the Sorbet compiler; and it would have come with its own set of trade-offs, so given all that, we wanted to focus then again on the developer productivity side of having a type checker, where we can actually make people writing Ruby code more productive.
Thatās cool. So inside of Stripe then, if you could come up with a percentage of how much code is āsorbetedā across the codebase - do you have those numbers? Do you know how much is ā
Yeah, yeah. Less than 1% is not using Sorbet.
[15:49] Yeah. Thereās kind of various strictness levels to what it means to have Sorbet turned on. So at the very bottom level itās what we call typed false. And even still, even though it says ātyped falseā, itās still doing some kind of sanity checking, which - itāll make sure that all of the classes and modules and constant references in the codebase resolve, and it will obviously check that your syntax is valid. But then up from that, thereās ātyped trueā, and thatās the point where Sorbet will start doing actual type inference on method bodies, and tell you if you have any classical type errors, like āexpect integer found stringā sort of type errors. And then one level up from that is ātyped strictā. And that typed strict - not only will it do the type inference, but itāll require that you put explicit type annotations on every method in your file.
I think we even have that typed strict level - so itās like 99% typed true or higher, but at typed strict I think weāre somewhere close to like 80%, or something like that. Itās the sort of thing where over time people encounter the file that doesnāt have type annotations, and encounter the files that do have type annotations, and they find that itās a lot easier to edit, and understand, and refactor the code that has the type annotations. And so theyāve self-selected to opt their files into these stricter checking levels.
Thatās interesting. Network effects, in a sense, right?
Like āHey, this file doesnāt have this. I want to bring it in there.ā Itās good stuff. Itās crazy to have such a project take over, too. In one of the posts - I think itās a 2018 post, saying āWhere Sorbet is at now.ā This is state of Sorbet, spring 2018. It actually says ā100% of our production Ruby files are sorbetedā, according to this. āEvery CI build in the main repository is checked at Sorbetā, and you kind of lay that out there. But to put such a percentage there - this is a big deal. Youāre making developers productive. How does type checking, how does this really equate to being more productive? What are some of the ways that this comes into play?
Yeah, so thereās all sorts of different things. I think that the quote that I like bringing up here is the first time that we ā so weāve built this type checker, and it was really just kind of like this policeman just kind of like enforcing that youāre not doing the wrong thing. In the beginning, thatās all there was; it was like either your CI check would fail with a big, red, scary message, or it would pass. And thatās fine. You can get a lot of value out of that. But the first time that we took this type checker and we started building editor functionality, kind of typical IDE sorts of features, and exposing that to users, that was when people really started to have their eyes light up.
So the first time that we sent an email to the company saying āYou can now use Sorbet to get accurate jump to definitionā, people were telling us āThis is the best Christmas present you could have ever given me. Itās July, I donāt even care.ā I think that people really identify with being able to understand their code, and use the information that the type checker has to just dive into an unfamiliar part of the codebase and have confidence that theyāre going to be able to figure out what itās doing.
It sounds a lot like what Sourcegraph markets, too. They call it code spelunking. Iāve heard Beyang Liu talk about thatā¦ Like, being able to jump to definition and explore a codebase - especially if youāre moving teams, like Jerod mentioned beforeā¦ If youāre moving from one shop to another, youāve got to relearn, if not just domain knowledge, but also like this built-up code knowledge of how the codebase works. In an untyped world, itās gotta be challenging if you canāt do that.
Yeah. And it might be a problem that you only realize is a problem at a certain codebase size. For example, even inside the Sorbet codebase itself, itās only ever been worked on by two or three people full-time. The codebase itself is only maybe 100,000 lines of code. But when you get into these codebases where itās like hundreds of people over millions of lines of code, and the kind of ownership of which parts of the codebase are owned by which teams is fluid over timeā¦ Youāre very rarely working with the same lines of code for an extended period of time, and so youāre kind of always doing that code spelunking, where youāre jumping from one place to anotherā¦ Yeah, thatās the part in my mind where type checking gets to be super, super-valuable.
[19:57] When I think about programming languages that lend themselves towards type checking enforcement, Ruby is like on the bottom of that list, isnāt it? I mean, this had to be a monumental task, because itās so malleable, itās so self-referential, it has reflection, it has metaprogrammingā¦ You can just monkey-patch and redefine and change stuff all the timeā¦ And despite the warnings of āUse with careā, we tend do that when itās convenient. And sometimes we do it just because we can. I know I used to be a young Rubyist whoād like to show off the different things that he could do, even if it was just to myselfā¦ You know, āOh, look what I can do.ā Was this very difficult to build? Are there still ways you could poke a hole through it? Whatās the situation with all of just the weirdness of Ruby as a language?
Yeah, I will definitely agree with you that the kind of dynamism of Ruby is both a huge strength, and that itās been whatās let communities like the Rails community succeed, but also a big challenge, just because those sorts of ā like, when you can only understand what the code is doing at runtime, obviously that stands in the way of static analysis. So thatās definitely a big problem. And I wouldnāt say itās a fully solved problem in Sorbet, by any means. Thatās probably still one of the biggest reasons why you might evaluate whether your company or your codebase should switch to using Sorbet and you decided against it. Your team really gets a ton of value out of the super-dynamic metaprogramming sorts of features of Ruby, and Sorbet would, in many cases, ask you to give that up.
Itās interesting, because Stripe actually started ā Stripe has never used Rails, but it has used a lot of metaprogramming, in especially its early historyā¦ And as people have started to adopt Sorbet at Stripe, itās kind of been this incremental rejection of the metaprogramming parts of Ruby. Part of this is because people see the value, again, that they get; all these features, all these safety guards that they get when people are using type checking in their files. So people will say, āHereās my trade-off. Iām willing to put down the metaprogramming and pick up the static analysis.ā
To dive into some specifics, if you can just basically read a network request that the static analysis tool is never gonna be able to see, and using the contents of that network request youāre gonna be able to define methods in Ruby. You could ask the user to defer the name of a method to define, and define it. And thereās nothing that the type system is going to be ever able to do to know that that method name is going to be available to be called.
So stuff like that has its place, and Sorbet basically just gives you escape hatches to be able to use that stuff. So again, we were talking about the typed false levels; if you have a certain file thatās using a lot of meta programming, you can just opt to turn checking off in that file, where itās maybe super-metaprogramming-heavy, and turn it on in the other files. You can also silence the type errors at a specific call site and say like āOkay, even if I do have typed true enabled in a given file, this one call site where weāre doing a lot of meta programming, Iām just going to ask the type checker to ignore that line.ā So you can kind of weave it into your system where you want the type checking to happen, and where you want to be able to use the meta programming. And yeah, each codebase or team or individual will kind of make those trade-offs, knowing what theyāre giving up and what theyāre gaining.
Are there any facilities in there to outlaw? You know, like āHey, no method missingā, for example. āWeāre not going to have method missing.ā Or maybe thatās not really a Sorbet thing. Maybe thatās like a linter ā I donāt know. I guess Sorbet is kind of a linter on steroids, isnāt it? I mean, how do you picture these tools fitting together?
Yeah, I think linters and type checkers are very complimentary. The thing about linters is theyāre way more heuristic-based, and so you kind of want the ability to say āI know better than the heuristic in this particular case.ā In Sorbet the rules kind of apply universally. So we are kind of more conservative with what we reject in Sorbet. Sorbet will not reject method missing, because if Sorbet rejected method missing, anybody who ever wanted to use it would not be able to use Sorbet. So in our codebase, we do have a bunch of linters. I donāt know if we banned method missing or not, butā¦
[24:15] Thereās probably some method missing in there somewhere. We should explain method missing, for those who arenāt regular Ruby programmers. So briefly, in Ruby, if you call a method on a module or on a class, and that class or that object of that class doesnāt have the method that you just called, thereās a method called āmethod missingā that you can define, which will then run other code that you have decided that it will run, in order to do whatever you like. So you can use it to dynamically define a new method, you can use it to run a switch statement, and do a bunch of different stuff, you could raise an errorā¦ Itās just basically a hook for you to write some code in case the method that you called doesnāt exist. And people have used that to do all kinds of things. One of the nice things is to write really nice DSLs, and provide like top-level keywords that are kind of arbitrary, or quasi-arbitrary, and use method missing in order to call them. But as you can imagine, you can also do some gnarly stuff in there, and itās difficult to analyze, because itās not defined until runtime.
Yup. So method missing is definitely one of those kind of tricky parts for Sorbet to analyze, but itās far from the only one. We do have plenty of linter rules that we turn on to basically say āThis is okay, this is not okayā, and yeah, kind of guide people into having the most success when using Sorbet.
Break: [25:32]
So one thing that Max brought up is like first-party, I guess, Ruby official types, which he says in the works ā I donāt know much about this. Iām sure you probably know a lot about this, Jake, just from being in the communityā¦ Tell us about that, how it relates to Sorbet, are they wildly different? Are they similar? Could they adopt Sorbet if they wanted to? He says itās divisive, so Iām sure thereās lots of opinions as well about this topic.
Yeah, I think itās mostly divisive just because typing in generalist is divisive in the Ruby community, mostly.
Right. Not like a specific implementation. Thatās going on in the Elixir community right now as well, is theyāre talking about types for Elixirā¦ And thatās divisive, because - same exact reasoning.
Yeah. And so obviously, as someone who works on a type checker and who has been interested in types for a very long time, Iām super-biased in favor of the typing side of this. And I hold the view that types will always win in the long term, butā¦ So youāre gonna get the bias view here on the state of typing in Ruby. I donāt even think Sorbet was the first attempt at building a type checker for Ruby. There were a number of research projects, specifically I remember a couple by the kind of like research projects out of the University of Marylandā¦ I think there was also one other type checker that was built by a person by the name of Soutaro Matsumoto, out of Japan, and it was called Steep. Then there was one other that was kind of like a very hobby project, built by someone at GitHub, in their personal time.
So Sorbet kind of started as this just one more type checker sort of thing. So itās always been the case that people have noticed that Ruby didnāt have types built into it, and kind of decided on various ways to add their own. Eventually, I think that the popularity of Sorbet and the kind of backing of having such a large company like Stripe and Shopify behind it meant that the Ruby Core team was more willing to consider what a first party typing support would look like.
We actually have met multiple times with the Ruby core team. For a period of time, we were meeting with them monthly to kind of talk about what the state of typing in Ruby would look likeā¦ And over time, it became apparent that the design constraints that we were going to be working under would be no syntax changes to Ruby itself. That partly this is because Ruby is already syntactically very complex, and parsing Ruby is already hard; adding more syntax in service of type annotations would have been just challenging on its own. But also, Mats, the person who created Ruby, and still has a very significant influence in what features get added and what donāt, was pretty partial to keeping type annotations out of the core syntax. So that meant that we were kind of focusing on having annotation files that lived alongside the Ruby source code; so you kind of have this split between like header files and source files that you might have in C and C++. So that comes with its own trade-offs.
Some people will say that is already a non-starter for them. That no matter what syntax you choose for these definition header files, that itās already going to not work for them and cause a division in the community. Thatās, I think, a valid concern, but letās just press forward and say that weāre fine with having these annotation files. The next thing that youāre going to run up against is do you use the same syntax as Ruby in these annotation files, or do you invent something completely new syntax? Sorbet - one of its design goals was to be backwards-compatible with the syntax of Ruby. And so all of the Sorbet type annotations are actually just a Ruby DSL. So thereās no transpilation step that you need to be able to use Sorbet in your codebase. Itās just kind of the magic of, again, the Ruby metaprogramming, one of the benefits that you can get it so you can define these ad hoc syntaxes, and theyāre backwards compatible.
So Sorbet already had this type annotation syntax that was valid Ruby code, and to make these kind of header files, these definition files, it repurposed that existing syntax. So you only had to learn one way to declare the type of an array, you only had to learn one way to declare a signature for a method, to declare an interface, to declare abstract methods, all these sorts of things. The fact that they lived in the source code of a Ruby file, or in some file alongside was just a preference for where you want types to live in your codebase. But I think that the problem with that is that by defining types in this DSL syntax that we had invented ourselves, it was kind of clunky. We had to go to kind of great lengths to be able to choose syntax that was backwards compatible with what we could build a DSL out of.
[32:16] So at the same time that we were working on defining these separate files, we came to the realization that we donāt have to be backwards compatible with Ruby in these new files. We could just throw everything out the window and design a type annotation syntax that would be a little bit more elegant, but not necessarily fully compatible with existing Ruby code. So that was the approach that we ended up taking, that eventually standardized as what they call RBS files, or Ruby signature files. And yeah, they just have a completely different syntax, but theyāre a lot less verbose than Sorbet annotations.
At the end of the day though, they are just annotations, and Sorbet could one day just parse them, and understand the annotations that are in them. I think that thatās mostly just been ā we havenāt quite gotten the feedback that people would really absolutely love to use Sorbet, but also like the one thing holding them back is whether it parses these RBS files, versus the annotation files that Sorbet supports. Weāve been focusing on building features for the people who are using Sorbet, and those people are asking, again, for things like better editor tools, or better type system features, so thatās where we end up spending our time. So itās kind of more just like not a fundamental separation, but rather just like it would be work that we have to do, and we havenāt yet found that it bumps up to the top of the list.
Okay, good explainer of the state of things, at least from specifically on the Sorbet side. What about on like the Ruby langās side, with this RBS? Is it going to happen?
Oh, itās already happened.
Itās already out there, and ā
Yeah, they shipped these annotations, this format in Ruby 3.0.
Okay, so itās shipped, and public, and you can just use Ruby 3.0 Plus, and annotate your Ruby with the RBS files. And itās just a built-in type checker into the language, orā¦?
So itās still you have to pick and choose your third party type checker. The annotation format is just ā
Okay, so itās not like built-in then. Itās like a spec.
Probably the most popular type checker that uses these annotations is the Steep type checker, which I mentioned earlier.
Thereās also ā yeah, thereās a handful of other tools that consume themā¦ Itās just that Sorbet kind of doesnāt, and maybe thatās the biggest point of division, is that we havenāt gone into the work to parse these files.
Is that just the nature of the ā that itās open source, and youāve got other things that are more important, obviously? Itās not that you donāt want to, itās like eventually you might?
Yup. Yeah, exactly. For example, the sorts of things that we would have to stop working on right now are ā weāve made a number of improvements to just the core type system for what you can actually express in the type system. Weāve made improvements to how fast Sorbet is, all sorts of things like thisā¦ And so we regularly go and ask people, whether thatās in the open source community, or people using Sorbet at Stripe, āHey, whatās the thing that you wish existed the most?ā and itās always something else.
I guess why wouldnāt it just get built in? Thatās what I donāt understand. And I guess maybe you could say, āWell, Ruby Gems wasnāt built in either for a really long time, and eventually, Gem became shipped with Ruby.ā And so this would be a similar circumstance, maybe; like, they want a bunch of tools to be able to do this, andā¦ It just seems like if they - ātheyā being the Ruby Core team - were super-committed to types, that maybe this is just step one, and theyāre going to do eventually. They would do this, and theyād say, āAnd download Ruby 3, and itās type-checked.ā
Yeah, I guess one of the benefits of having it be this third party gem is you can iterate on it and release new versions independently of Ruby versions. So Ruby kind of famously releases a new version only once a year on Christmas. But if you wanted to add a new revision to the RBS spec, or standard, or parsing libraries for it, having that be in this extra gem that youād have to opt into makes the release process a lot easier.
[36:17] Good point. Youāve obviously thought about this more than I have, Jakeā¦ And of course, thereās lots of different parties involved in these kinds of decisions.
Itās got the wrong name though, Jerod.
Whatās that?
Thatās why itās not being adopted by the core team. It needs to be called Type Ruby, or something like that.
Yeah. What do you think?
Sorbet is a cool name, man.
It is a cool name, but I just wonder if it needs to be like TypeScript-like. Like, take a page from the TypeScript book, and itās gotta be TypeRuby, or something. I donāt know. Iām not saying itās the wrong name, Iām just making a joke.
Again, I think one of the other things that set apart Sorbet and TypeScript is just the amount of evangelism that has been put into each project. I think that Microsoft in general is just really good at building products for developers and evangelizing themā¦ And Stripe as a company does that as well. Obviously, Stripe is an API company, and it evangelizes their API, but itās never been the case that Stripe really evangelized Sorbet. And thatās - yeah, just having popularity and community enthusiasm behind the project would be the sort of tipping point, I think, behind maybe more first party integration with Sorbet.
But weāre kind of fine with the way things work now. We build the thing, and we ship the thing, and people who want to use what weāve built are completely able to do so. And people who weād prefer to ignore, we can.
So letās talk about Sorbet itself, like the implementation, the designā¦ I was reading some of the docs and some of the guides, just trying to see what it was like to use. I did notice pretty decent pure Ruby DSL; youāre writing Ruby inside of your Ruby in order to specify a method signature, and that kind of stuffā¦ There were a few phrases on the website that I was like āThis sounds fancy, because I donāt know what it is.ā Now, Iām not a type guy, so maybe people who are all about type checkers know these kind of things. But I read āGradual type checkingā, I read āControl flow-sensitive typingāā¦ Some stuff that sound like Sorbet features, that Iām sure you had a large part in, that maybe you could ā that might be interesting to our listener to learn about Sorbet.
Yeah, absolutely. So gradual type checking is just this idea that you donāt have to type-check 100% of your codebase from day one; that you can ā
Like opt in, yeah.
ā¦opt in at various levels of granularity. Thatās basically table stakes if youāre trying to add a type system to a language that didnāt start with a type system. I donāt necessarily ā there will definitely be people out there who tell you that this is actually a completely desirable property, even if youāre designing a language from scratch today. Again, youāre getting the bias type system nerdās view, and I think that itās more just like a trade-off that you have to accept if youāre adding types to a language that didnāt start with them. Because it means that you have these gaps; youāll always have these gaps in the type system, where it wonāt be able to tell you when youāve messed up. And so the biggest problem then is actually figuring out and identifying where the gaps are, if thatās the state of your codebase.
Control flow-sensitive typing is really interesting, and I actually think that even more traditional languages that donāt have backwards compatibility with untyped programming languages could benefit from. And thatās just this idea that if you have something that is either nil, or some real type, like maybe an integer, or some struct, some class that youāve written, that the type of the variable will be aware of all of the conditional branches that youāve taken through the codebase.
So if you start out with something thatās either nil or integer, and then you say, ās this thing nil?ā Well, if you use that variable inside of that branch, Sorbet will be able to say āYouāve already checked that this thing is not nil. Here itās an integer.ā TypeScript does this; most languages that are gradual type systems for existing untyped languages end up building this feature, just because thereās so much Ruby code out there thatās written this way, or so much existing untyped code out there written this way, that you get a lot of ease of adoption by building this feature. You donāt force people to go change their codebase to be - I donāt know - maybe a little bit easier to type-check. So itās this advanced type system feature, for sure, but itās one that models Ruby code as it exists in the real world, and makes it easier to start using the type checker.
[40:29] Okay. Whatās an example of when that might be useful, or some code that might typically hit up against this? Just not knowing necessarily the value being returned?
Yeah. So for example, letās say that you are interacting with the database, and you try and load some object with a specific ID; youāre going to either get back nil, if that object doesnāt exist, or your ORM is going to give you the model class back. And if you are writing kind of good, defensive code, the first thing that you do when you try and load this thing is youāre going to ask whether it existed or not, and then youāre going to handle that exception case. Maybe you report an error to the user, maybe you try looking for the object in a different place, maybe you do something else. But in the case where you definitely know that you have it, now you can start calling the methods on your model; you can ask for the userās name, and the userās email, or whatever fields are on this model class that you got back.
So if Sorbet thought throughout the entire method body that this variable that you got back from your ORM was either a nil or a user model, then itās going to say āI donāt know whether this ā I canāt claim for sure that you calling these methods on this model exists or not.ā
Gotcha. Yeah, I can definitely see a lot of Ruby code out there like that, because thereās so many ā like, that nil case is just always the edgeā¦
ā¦that just causes us to want types in the first placeā¦ [laughs]
For example, this was kind of famously ā Javaās billion-dollar mistake was conflating that every type could be null. I think that itās obviously very hard to make changes to a language as widely used as Java is now, but itās the sort of thing where if you could solve this problem, and build control flow-sensitive type checking for specifically whether a value is null or not, I think it would go a long way to making it easier to reason about - yeah, in Java even, like whether a value is null or not.
Mm-hm. So you bring up an ORM, which makes me think about Active Record, which makes me think about Active Record Base, as it used to be called; base classes, orā¦ It makes me think about existing Ruby libraries. One of the huge advantages of TypeScript being so wildly evangelicized sized and adopted is that like darn near every library is shipped with type definitions for TypeScript to just work out of the box. And Iām wondering if Sorbet has that kind of momentum, or is there a place where you can go out and say āIām going to use this Ruby gemā, and most of the gems are already typed by somebody?
Yup, I definitely noticed that in TypeScript. Most libraries that you pull off of npm are already going to work with TypeScript just out of the box. Thereās kind of nowhere near that level of support for typing in Ruby Gems that youāll encounter most commonly. And thereās a number of reasons for this. Part of it is as a project, weāve almost always focused on making it easier for application developers and library developers; weāve always taken less steps to making the process easier for them. Thatās definitely something that would need to change. Partly, itās just kind of, weāve never gotten around to it.
I think that despite the low investment, people have still done it, and still published gems that have type annotations for them. The biggest ones though, like Rails, donāt. And so if you want to be able to use Sorbet in a project thatās using like Active Record Base or something like that, youāre going to need a different approach to be able to type these sorts of things. The way that that is typically handled in Sorbet is with third party gems that will analyze the way that you are using these gems, and generate those annotation files that we were talking about earlier.
[44:14] So instead of annotating the source of Active Record itself, you would look how Active Record is being used in your codebase, and generate some annotation files, and rely on those annotation files to figure out what the gem is doing.
This seems somewhat fraught. Is that pretty reliable at the end of the day, or is my spidey sense accurate?
Itās somewhat fraught, for sure. Itās kind of a question of like how much youāre going to push it. If youāre using the very common cases, itāll be fine. But if youāre trying to do something more complicated, especially if you combine this with heavy use of metaprogramming, then itās going to be a little bit trickier.
I think that recently one person in the community - itās actually someone who has been on this podcast before, Justin Searlsā¦
Searls, yeah.
Heās actually maintaining this Mocktail library for ā kind of a testing library for Ruby. And he has been posting quite a lot in the Sorbet Slack, just about what it takes to get typing added to a gem. And itās been really interesting, just because itās exposed all of these places that we could make the experience better. Just about like decisions for if you want to have type annotations in this gem, should you start with having annotations that live inside the source code, and then strip those out before you publish? Or should you put them inside your source code, and also have files that live alongside? Should you make it easier for people to just generate the RBIs on their own? Anyways, itās like, his experience has been neat, because every time he ran into a challenge, he posted about it, and asked questions, and itās been kind of eye-opening to just have that experience. Justin, thank you for all of the comments that youāve given us.
Thatās one of Justinās skills, is communicating. Heās always willing to post those comments, whether theyāre more or less salty, depending on his moodā¦
Yeah, heās been quite polite, soā¦
Awesome. [laughs]
Maybe thatās gone through your filter, but itās been great seeing what heās been working on.
Heās usually pretty unfiltered, but heās also a kind person. When you say RBI now, is that the same thing as RBS on the other side, but itās like ā
Yeah, sorry - thatās the name that Sorbet uses for these annotation files. It uses a different syntax, but for the same goal. RBI just stands for Ruby Interface.
Okay. So if I was going to provide type annotations for something, I would produce an RBI. Or I guess this is what Justin is trying to figure out with Mocktail, is what do we actually ā whatās our output as a library author?
Yeah. So as a library author, you would either have to have Sorbet read the sources of your gem files, and use that to understand whatās definedā¦ But typically, people will not ask the source of the gem via type checks, because obviously, then itās also going to do things like actually read the method bodies, and make sure that all the method bodies type check, and thatās going to be particularly slow. So having just the interface files will speed things up a bit.
Why go RBI and not RBS? Why would you create a whole new world, in a way?
Well, so it kind of harkens back to the conversation we were having earlier about āWhat syntax do you want in these files?ā Do you want the contents of these annotation files to be the same syntax as Ruby code? Or do you want them to start from this blank slate, where you can design the syntax that you want? So the syntax of RBI files is literally just Ruby files, with no method bodies. So if you wanted to annotate a method, itās the same syntax in a Ruby source file, as it would be in a Ruby interface file.
Whereas RBS is this streamlined syntax that you plus the Ruby team kind of collaborated on? Is that correct?
[47:47] Yeah, exactly. In defense of the RBI syntax, I think that one of the things thatās a lot easier about it is you donāt have to kind of switch between two type systems in the docs. So if you see a type annotation anywhere in Sorbetās docs, itās completely valid to put that both in the Ruby file, or in the RBI file, versus having to learn two type syntaxes if youāre trying to use Sorbet with RBS files.
Gotcha. Now, does Sorbet run faster with RBIs than it would just in the source code, or does it not matter?
Itās really just a function of like how many bytes Sorbet is reading.
If your source files are really long, then it might slow down a little bit, just to parse through and get the actual annotations out.
Yeah. The crazy thing though is just how fast Sorbet actually is. I have gone on record many times and claimed that Stripeās Ruby codebase is the largest. Obviously, I havenāt seen every Ruby codebase in the world, and no one has contested me on this pointā¦ So Iām going to go forward and continue saying this until someone corrects me, that Stripeās Ruby codebase is the largest Ruby codebase in the world, and ā
Bigger than GitHub?
Oh, by a long shot. And the nice thing about this if you are a user of Sorbet is Sorbet will ā the amount of time that it takes to type-check your codebase will never be longer than the slowest codebase type-check. So you kind of like benefit from ā someone will always encounter performance problems before you will. And that someone will be Stripe.
That someone will be Jake Zimmerman. [laughs]
Yeah. So thatās kind of why a large part of the work that we end up doing is just optimizing and optimizing. One of the fun projects that I got to work on last year was making Sorbet more incremental. The entire history of Sorbet, if you needed to run Sorbet in your editor - it would basically just retype check the entire codebase. And it was fast enough. Like, it would be a little bit slow. Youād be able to see when itās doing this re-type-check operation, but it would only maybe last a couple seconds, and thatās fine. Thatās actually like most of the time fast enough.
Eventually, the codebase got to the point where that wasnāt fast enough, which meant that we had to do some work to make it faster. And the way that we did this was just being smarter about not doing work. Basically, we would figure out the contents of any given edit, and say like āOkay, well, we can actually tell that in this edit only these definitions have changedā, and then do some really clever things to not have to re-type-check the entire codebase. So itās those sorts of optimizations that personally I find really fun, and also people benefit from; the codebase will never get to the point where itās super-slow to type check, because weāve found the problem, fixed it before it ever becomes a problem for anyone else.
Stripe is bigger than Shopify?
Yup. Shopify, I think ā
Heās saying this so unequivocally. [laughter] Heās like āYup.ā
I know ā Shopifyās codebase is one of the codebases where I have actually very exact numbers on how large that codebase is.
Because theyāre using Sorbetā¦
Yup. Theyāre also one of our closest partners that we collaborate with on improving Sorbet. Theyāve made a number of contributions themselves, and we meet regularly with them to figure out how we could be making Sorbet better. So thatās kind of like one of the things that Iām always worried about, is āWell, what if the performance is getting out of hands on other peopleās codebase, and Iām not able to even see what the problems are?ā Because I can go profile our codebase and see what the problems are.
Is Twitter still Ruby, orā¦? Are they still a Rails shop, orā¦?
I donāt think Twitter is Ruby anymore. I think they use ā
Scala. A lot of Scala.
Scala, and maybe some other languages at this point.
[unintelligible 00:51:08.14] too much, I guessā¦
Is Stripe bigger than Basecamp? It probably is.
Thatās one of the ones I donāt know of. But again, no one has reached out and told me otherwise, soā¦
Alright, listen out, listener out thereā¦ If you have a codebase larger than Stripe, or you think itās larger, then you need to let us know, so we can prove Jake wrong. How many lines of code roughly?
Yeah, so I wrote a blog post on the Stripe Engineering Blog in May of 2022, I believeā¦ And the codebase size at that time was 15 million lines.
And that was a year ago, roughly.
That was a year ago. If you think you can beat 15 million lines, Iād be very, very curious to hear. Now, I also want to express my condolences for having to work in a 15-million-line codebaseā¦
Is lines of code the best way to quantify it though? Wouldnāt bytes be better?
Yeah, bytes would be better, for sure.
I mean, you can have a long or a short line, right?
If you have like millions of short lines, and I have half a million really long lines, maybe I win.
Yup. No, absolutely. Bytes ā like, if Iāve sniped you enough, you, dear listener, into āLetās compare our codebase sizesā, I will try and ask if you can find the number of bytes. Itās usually the tools that report codebase sizes are easier to measure lines of code, for whatever reason. So thatās usually kind of like ā that also makes for nicer headlines and blog posts.
āStripeās codebase has this many bytesā doesnāt quite have the ring to it than ā
Right. LLCs are better in that case, yeah.
But if itās gonna come down to it, weāll go byte for byte, thatās what youāre saying. Weāll definitely do that.
If youāre comparative on bytes of code letās go byte the byte.
Yeah, exactly.
Cool, man. So I guess the only thing that Iām left thinking is what is the user experience like? Letās just say I have a 12-million-line Rails app out thereā¦ Or maybe even a 16-million-line Rails app out there, and Iām thinking Sorbet might be for me. How do we opt in or incrementally adopt? What does it look like day to adopt it and use it?
Yup. Yeah, so the steps to adopt it - you can just go to Sorbet.org, and thereāll be instructions there. The instructions will basically ask you to install a gem; thereās actually two gems. One of them is going to be the static type checker that will report all the type errors in your code, and one of them is going to be that runtime library that lets you use the DSL for annotating type syntax.
So you add these two gems to your codebaseā¦ You donāt even need to write any annotations out of the box if you donāt want to, and you can start type checking. Itāll probably report a bunch of errors on your codebase; you can either fix those errors, or you can turn off the type checker in those files, and thatās that.
The thing that youāre going to want to do is as quickly as possible get it to the point where every file is at least typed false. So if you have any files that donāt have valid syntax, or that have constant names that Sorbet doesnāt know about, thereās various ways to fix those errorsā¦ But thatās kind of the baseline, is getting every file to be able to type-check it typed false. And from there, you can now start using Sorbet in CI, and making sure that it continues to type-check. You can start using Sorbet in your editor, and take advantage of all these jump to definition features, and then gradually, again, opt individual files into stricter levels, start adding type syntax to the methods that you care the most about, and thatās kind of it.
Yeah. What does the editor support look like?
So thereās a VS Code extension that you can install, and itāll automatically figure out where Sorbet is installed in your codebase, and how to run it. And itāll show you the errors, and all of the fancy VS Code features will be wired up. If you donāt use VS Code, the editor support is powered by a language server protocol server, and itāll work with any editor client that supports the language server protocol, which is most of them at this point.
I thought that might be coming, because I read that youāre a Vim guy, and I thought āThereās no way Jakeās not gonna have support for his favorite editor through some sort of fashion.ā
Yup. No, t works completely fine in Vim over LSP.
What about tracking adoption? I see thereās two documents here in your docsā¦ Adopting Sorbet, which is outlined, as you mentioned, and then you also have tracking it. How important is it to track adoption when you begin to incrementally bring it in? Whoās tracking the adoption?
Iād say that the tracking adoption, the metrics one is more focused for larger companies that are going to be staffing the effort to ad types as like a proper project. The nice thing is you want to give other stakeholders at your company visibility into the progress that youāre makingā¦ And thereās various ways to ask Sorbet to report how much coverage there is in the codebase, so that you can keep people involved and in the loop.
The first thing you asked me was how many files does Sorbet have type-checked in Stripeās codebase, and - yeah, itāll print those out, so that you never have to be in the dark about how much progress youāre making.
[55:59] I also see TypeScript versus ā I guess versus, or comparative to Sorbet as a document. Iād imagine, since you all use TypeScript on the frontend, and then on the backend Ruby, obviouslyā¦ This Sorbet type checker - youāre wanting to keep the mental gymnastics to a low. So what is this document outlining? If youāre familiar with TypeScript, you should be somewhat familiar with the way Sorbet is doing? Are you trying to mirror a lot of what theyāve done well?
I think that doc is one big table that kind of like āIf you know this type system feature, and the name for it in TypeScript has this name, hereās the corresponding name in Sorbetās type system.ā Because again, people are way more familiar ā for a lot of people, TypeScript is peopleās only experience with a type language, especially these days. So kind of anything that we can do to make it easier for people to onboard to Sorbet and understand what names weāve chosen for various pieces of the type system - thatās what that Doc is trying to provide.
Earlier in the show you said - Iām gonna paraphrase - something to the fact that types will win in the end, or itās a type worldā¦ Restate that, and give us the synopsis of why you think thatās true.
I just ā yeah, part of it is just a fanatical belief, and part of it is I just live and breathe the benefits of type checkers every day. And especially once you get to the point where you can no longer hold the codebase in one personās head, where you have to start collaborating on a codebase with more than one person, which is almost all codebases that do anything interesting these days, having a type checker to offload the burden of understanding the code and keeping track of relationships between various files and data structures and all these sorts of things is super-valuable.
So just - again, weāve kind of talked about this at the beginning, where the language that you use changes the way that you think, and changes the way that you approach problems, and languages with type systems I think give you such strong vocabulary for how you can structure your thoughts.
Did you say āTypes will win in the end?ā What was the exact phrasing? Lay it down hard.
[laughs] Weāre trying to name this episode; weāre trying to get you to nail it down so we can name it that.
Yeah, letās ā I mean, now that I know Iām on the book for figuring out what it isā¦ I will say like - yeah, types will win in the end, just because theyāre so much moreā¦ Yup. This kind of harkens back to my schooling days, where I had professors who were super-fanatical about types, and they kind of instilled in this ā kind of like going to church and hearing your preacher preach about whatever gospel, just kind of preaching about the values of typesā¦ So types will win in the end, sure.
Alright, last question for meā¦ You are a type fanatic, working in a dynamic language, which you seem to have much respect for, at least on display, and you have a cool job, so surely you want to keep itā¦ But if you were to not have to use Rubyā¦ Like, if you were just like Jake Zimmerman start from scratch, surely thereās a programming language you like better, because of the type side of things. What would you be working in? Would it be something ā
So when I was in school, almost all of our classes were either in C, which is just - everyone should learn C at some point - or they were in this language called Standard ML. Standard ML is not a very widely-known language, but it was kind of one of the first languages to really pioneer algebraic data types, and pattern-matching, and type inference, and all these other type system features that have started to gain rapid popularity in other languages. So I would probably ā I think that using Standard ML as a language to actually write code in is almost impossible. Thereās no libraries for it. Thereās no build system for it. Thereās no way to really collaborate with other people. But a lot of languages have gone to great lengths to copy their features. So I think that the most popular language that has copied the most from Standard ML is probably Rust. So I would probably try and use Rust if it were possible.
Very cool. Iām looking at the Wikipedia, āInfluenced: Elm, F#, Haskell, OCaml, Python, Rust and Scala.ā So a lot of influence, like you said, on other languagesā¦ I guess at the end of the day Rust will winā¦ [laughter] Rust will win at the end, because Jake says so.
Cool. Adam, any other questions on your end before we let him go?
[01:00:22.15] Iām clear. I almost brought back in āCold ice cream and hot kissesā, becauseā¦ Sorbet. But whatever.
Ah. Donāt do it. Jake probably hasnāt heard that episode yet. [laughs]
Heās like, āWhat are you talking about?ā
The funny part about the naming of Sorbet is Iām not even a huge fan of Sorbet. [laughs]
I really like ice cream better.
What exactly is Sorbet?
I think itās more of like a dairy-free alternative to frozen desertsā¦
Yeah, like strawberry usually, orā¦
Well, itās like a snow cone of sorts, right? Similar to that.
Thatās the other funny part, is I donāt think itās typically served in a cone, but our logo definitely has it with a cone.
Now we know whatās holding back adoption; itās just a cognitive overloadā¦ [laughter] āWhat is this Sorbet thing?!ā I think itās a cool name, just because itās different, and memorable.
It is. Well, thatās half the battle. I mean, Go, for example, is a challenging language to operate around when it comes to finding information, because itās just a good name, but poorly named in reference to the fact that everything goes somewhere.
Itās overloaded.
You have to say āGolangā, which is basically frowned upon by anybody who writes Go daily. Like, Golang is not part of their lexicon at all.
They have like weird rules around this.
Just social norms. Like, you can type Golang, but you shouldnāt say Golang. Iām like āI donāt know all these rules, peopleā¦ā
Thereās definitely a similar problem with Sorbet, where if you try and search like Sorbet, a thing that I need to search for, half the time itāll just show you like recipe sites.
Youāre gonna get some frozen sherbert, or whatever it is. Well, you landed Sorbet.org, which is a sweet website, considering in 2017 most websites were overtaken by themā¦ But just one single word, got the.org, so I mean, thatās good.
Yeah, thatās good stuff.
Yeah, it was definitely ā I was excited to get that one. Thereās actually quite a few good domain names out there. Itās just kind of a question of how much you have to pay for them. But luckily, it wasnāt a personal project, it was a Stripe projectā¦ So what looks expensive for me looks a lot cheaper for Stripe.
Good point. Adam, we should start some Stripe projects. [laughs] Get some good domain namesā¦ Alright, weāre bike-shedding the name; I think that means weāre officially done hereā¦ Donāt you think, Adam?
Letās do it. Weāre done.
Jake, thanks so much for coming on, man.
Yeah, thank you for having me.
Changelog
Our transcripts are open source on GitHub. Improvements are welcome. š
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK