3

LinkedIn Open Sources Interactive Debugger for K8s AI Pipelines

 7 months ago
source link: https://thenewstack.io/linkedin-open-sources-interactive-debugger-for-k8s-ai-pipelines/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

LinkedIn Open Sources Interactive Debugger for K8s AI Pipelines

LinkedIn Open Sources Interactive Debugger for K8s AI Pipelines

Based on Lyft's Flyte Kubernetes scheduler, FlyteInteractive connects with VSCode Server inside Kubernetes pods to access resources and large-scale data on the clusters.

Feb 15th, 2024 8:00am by

Mary Branscombe

Featued image for: LinkedIn Open Sources Interactive Debugger for K8s AI Pipelines

Image by Diane Kim from Pixabay.

VOXPOP
Try our new 5 second poll. It's fast. And it's fun!
Why Will You Get Laid Off?
I make too much money
Poor performance
Revenue and profits lower than planned
Too many people hired in anticipation of growth
Technical projects not delivering the expected ROI
I don't have the technical skills that my company is seeking
Company needs to cut ranks to appease Wall Street
None of the above -- no way I'm getting laid off
We'd love to hear what you think.

Kubernetes is increasingly popular as a platform for building machine learning projects on, but the developer experience on Kubernetes remains challenging, often requiring more infrastructure expertise than many coders are interested in acquiring.

And despite the promise that containers encapsulate applications and their dependencies mean portable and consistent environment throughout the development cycle, that’s just not practical for the largest models like those used in generative AI, where neither the dataset nor the GPU hardware is available for developers working locally.

To improve the developer experience, LinkedIn created FlyteInteractive, which provides the user with “an interactive development environment that allows them to directly run the code, connect with Microsoft VSCode Server inside Kubernetes pods with access to resources and large scale data on the grid. It’s like doing SSH to the GPU port and developing directly from there. Everything’s exactly the same,” explained Jason Zhu, a LinkedIn machine learning engineer who helped create the software.

Instead of writing a mock dataset to use with their model, developers have access to the real dataset that’s on the cluster using VSCode’s remote development support, which avoids wasting time on a model that can’t cope with the full-size dataset. “As we’re pushing towards larger and more complex architectures. It’s almost impossible to develop the code locally and get it tested,” he explained.

“The resources available for local development don’t include the same high-end, high-priced GPUs that are used in production, the same amount of memory — or the complexities of a distributed system. You can compromise the model size and complexity [to run it locally] but that will also compromise the chance that once you will upload the model to real production data, it will succeed.”

Early Flyte

As the name suggests, FlyteInteractive is a plug-in for adding more features to another open source project already in use at LinkedIn, Flyte.

Originally developed and open sourced by Lyft, Flyte is a workflow orchestrator for Kubernetes written in Go that’s designed specifically for data and machine learning pipelines, with an interface that allows developers to build their workflows in the most popular language for machine learning developers: Python, with strong type checking so more bugs are caught at compile time (which can save money as well as time given the expensive infrastructure required for machine learning).

Flyte graduated from the LF AI & Data Foundation in early 2022 and is already in use at HBO, Intel, Spotify — and LinkedIn, which uses AI extensively and has already migrated all its LLM workloads as well as some traditional machine learning workloads. Flyte covers more scenarios than Kubeflow and doesn’t demand as much Kubernetes expertise from developers (but it also has Kubeflow integrations for popular packages like PyTorch and TensorFlow).

A big part of the appeal for large organizations is the scalability it offers, according to Byron Hsu, a committer to the Flyte project who works on scaling machine learning infrastructure at LinkedIn. “We have over a thousand workflows every day and we need to make sure every workflow can be brought up quickly.”

Flyte also helps with the kind of rapid experimentation that’s so important for machine learning, where datasets change and new algorithms come out frequently. “The scheduling time is super, super fast so users can do experiments quickly,” Hsu told the New Stack.

The Python interface also makes Flyte easy for machine learning developers to pick up: “If you want to add a custom Python task to your workflows, it’s intuitive and easy in Flyte. It definitely makes machine learning developers much faster.”

Flyte also brings some familiar DevOps features that speed up machine learning development, explained Zhu, who works with extremely large models like the one that drives the LinkedIn homepage feed. “Previously, every time we built our pipeline we had to pull in the dependencies locally and we had to wait for that [to happen]. But because Flyte is image-based, we can bake in all those dependencies in the image ahead of time, so it just takes several seconds for a user to upload their job and the process of putting in all those dependencies happens at runtime.” That saves a significant amount of time, including every time you update a workflow and run your machine learning job again.

To encourage code reuse and avoid every team rebuilding the same components and processes for each new project, LinkedIn has created a Component Hub on top of Flyte, which already has more than 20 reusable components that save a lot of repetitive work. ”It’s for common tooling like data pre-processing or training or inference,” Hsu explained. “The training team can build a training component like a TensorFlow trainer and all the ML engineers at linked can use that without reimplementing it.”

This also makes more powerful and complex techniques like the model quantization Zhu has been working on recently much more widely available by turning it into a function or API call. There are multiple algorithms for converting a model’s representation from high to low precision so you can compress models and serve them using the fewest possible resources and usually machine learning developers would need to research the latest developments, pick an algorithm and them implement it for their own project.

“We built it as a component and because Flyte has the concept of reusable components, for every other user’s pipeline, they can choose to call that as an interface or an external API. So they can quantize their model after the model has been trained, no matter whether it’s a model for summarization, or it’s a model for reasoning or is a model for entity extraction,” Zhu said.

Developers can explore multiple algorithms quickly, because they can just plug them into their workflow to test their effect on both resource usage and the accuracy of the model.

“If you’re not sure whether quantization will work on your specific use case, we can have a centralized hub with all the different quantization algorithms, so you can test them all and look at the matrix of results and the latency to understand the trade-offs and figure out the right approach. As the field evolves, more quantization algorithms will come up so we have to have a very flexible platform that we can test all these algorithms and add them to the centralized hub that all the downstream pipelines can benefit from,” Zhu said.

Remote Interactive Debugging

Being able to write your pipeline more quickly and reuse components speeds up machine learning development enough that the software engineers at LinkedIn started to notice the other things that were slowing down their workflow: everything from having to work with smaller mock datasets that turned out not to match production datasets well enough, to the local development and testing environment lacking the hardware and resources of the production environment which means artificial limits on the size of models to the long cycle of debugging and having to wait for the code to be deployed before finding out if it actually fixed the bug.

Thanks to the differences between the local and production environments only about one in five bugs were fixed first time around: with each code push taking at least 15 minutes to get into production. Tracking down even a minor bug could take dozens of attempts: in one case, it took nearly a week to find and fix an issue.

These problems aren’t unique to machine learning development, but they’re exacerbated not only by the sheer size of model machine learning models and the datasets they work on and the expensive infrastructure required to run models in production but also by an ecosystem that doesn’t always offer tools developers in other areas take for granted, like code inspection and remote debugging.

Even the smallest generative AI model that has reasonable can’t run on a CPU, Zhu pointed out. “When you get to that stage, it’s really natural for us to move the coding and debugging process into the Kubernetes pod or GPU clusters with the real data and the same resources as you would run in production.”

FlyteInteractive can load data from HDFS or S3 storage and it supports both single-node jobs and more complex multinode and multi-GPU setups.

Developers can just add the VSCode decorator to their code, connect to the VSCode server and use the Run and debug command as usual to get an interactive debugging session that runs their Flyte task in VSCode. Flyte caches workflow output to avoid rerunning expensive tasks, so VSCode can load that from the previous task.

You get all the usual options like setting breakpoints (even on a distributed training process) or running local scripts, as well as the code navigation and inspection tools that make it easier to understand the complex code structure of a large model with multiple modules and see how the data flow into the model.

You can also set the plug-in to automatically run if a Flyte task fails, which stops the task from terminating and gives you a chance to inspect and debug from the point of failure. When you’ve worked out what the problem is and rewritten your code, you can shut down the VSCode server and have Flyte carry on running the workflow. “You can resume the workflow with the changed code: you can just click a button and then the task will run with the new changed code and the whole workflow will continue,” Hsu explained.

The Jupyter notebook support in FlyteInteractive will also be helpful, he suggested: “It’s a quick orchestrator with the capability of Jupyter notebooks and interactive debugging, so you can use it to both quickly experiment and also for a scheduled job or batch job.”

Although it’s currently a plugin, now it’s open source he hopes with community input it will become a built-in feature in Flyte.

Exploring Resources and Code

FlyteInteractive has already saved thousands of hours of coding and debugging time at LinkedIn; it might also help with cost control with its resource monitoring option. “If a pod is idle for a certain period of time, we’ll just clean it up and send an email to notify the user ‘Hey, your pod has been idle for a while. Think about releasing the resource or doing something to take some action on it’.” In the future, Hsu told us that will be finer-grained. “For example, we want to detect GPU utilization. If they occupied a GPU, but they don’t actually use it, we might want to kill it after say ten minutes, so we have better budget control for our GPU system.”

That relies on the checkpointing support in Flyte because taking checkpoints is expensive and not usually a good fit for the iterative training loops used in machine learning. “We have to provide good checkpointing so that out when user job gets pre-empted, it also has the model saved.”

But for developers, the most appealing feature isn’t even the fast debugging, Zhu suggested. “I like the code inspection feature because it allows me to understand the inner working mechanism of algorithms quickly and also helps me to come up with some new approaches.”

That’s not just useful for your own code, he pointed out. “Not only can engineers apply this to their internal repos but they can also apply that to open source repos. As a field, ML is super fast: new algorithms come up every week that engineers like us have to test out. We can just point this tool at an open source repo and quickly understand whether it’s a technique that we want to go with.”

GroupCreated with Sketch.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK