How Technical Operations Can Build on the Success of Data Science Notebooks

This article discusses data science notebooks, a popular document format for publishing code, results, and more. Find out more below!

Nov. 24, 21 · Big Data Zone · Analysis

Like (1)

Join the DZone community and get the full member experience.

Join For Free

Data science notebooks, a popular document format used for publishing code, results, and explanations in readable and executable form, broke new ground by combining an ongoing narrative with interactive elements and displays. The result was a new way to capture and transfer knowledge about the process of discovering insights. By studying why data science notebooks have worked so well, we can understand more about related areas with similar characteristics, such as Technology Operations (TechOps).

At first glance, many of the attributes of data science notebooks also apply to TechOps. However, the data scientist and TechOps cohort have different objectives. A data scientist is interested in variable results based on changing elements within queries. A TechOps team responsible for complex operational systems looks for variables and patterns, seeks to understand the root cause, and takes corrective action. Data science notebooks are conducive to instruction and are easy to change. However, in a production operations setting, things need to be repeatable rather than variable. To align with the different user needs in TechOps, the notebook concept evolved into runbooks.

How Notebooks and Runbooks Are Similar

Notebooks allow users to create and share documents that combine live code, equations, rich text, visualizations, narrative text, images, videos, plots, widgets, and graphical user interfaces into a single document. Since the first notebook over 30 years ago, interfaces have grown exponentially, enabling use cases from data cleaning and transformation to numerical simulation, statistical modeling, and data visualization.

The way notebooks have been used can be adapted to fit the complex operations environments TechOps teams work within. The blending of a narrative with live code and results of queries and analytics create a living document that can reflect what is happening in real-time and create an archive. Many of the same elements of the data science notebook pattern can be used to record ad hoc activity and preserve knowledge.

The beauty of notebooks lies in the integrated platform of live code, narrative, and everything else needed to demonstrate results or findings. This is an especially powerful pattern that also applies to modern TechOps practices in which a runbook or incident response can span many different systems.

In TechOps, incorporating a rich, interactive narrative increases the likelihood that the process can be duplicated rather than created from scratch. It also allows operations teams to search for possible root causes, sifting through a rich set of information, using combinations of “trigger symptoms” that precipitated an event.

Ways Notebook Capabilities Come to Life in TechOps

The notebooks used in data science and the runbooks used in TechOps share the same set of principles that are expressed differently in the implementations.

Centralization and access to different resources: Hosted in a browser (rather than on a local computer), notebooks enable data scientists to access data from anywhere and do more complex processing. For TechOps teams, by codifying processes through runbooks, data is centralized and accessible to people throughout teams and organizations. This ensures all stakeholders understand what’s happening, especially what data has been analyzed.

Easier data integration and automation: Organizations often use various analytical systems, operational databases, and data storage formats, such as object storage requiring credentials and permission. Notebooks can direct connections to different data sources and do not require additional user access or approvals. In TechOps, automation of data integration creates visual displays and analytics that help teams understand what is happening in various systems. In both cases, one must have connectors to the data in all the underlying systems.

Support for rapid iterations: Notebooks allow fast feedback in the test-code-refactor loop. They have the added appeal where cells can be executed in any order. The iterative model is helpful in capturing what happens during an incident response where there may be many false starts and testing of hypotheses before the right answer is found. The runbook created from that process can then be edited and turned into a standard document.

Sharable and collaborative: Notebooks are an interactive editing platform that enables users to find and use code that other data science community members have shared publicly. This collaboration reinforces reproducibility, where the final report can be rerun with different assumptions by anyone who has access to the notebook. In TechOps, using runbooks enables the creation of a living knowledge base that emerges out of work done rather than assembled from notes and other artifacts afterward. When TechOps teams use collaborative tools, they can amend and annotate documented processes in real-time, capturing contextual insight that will benefit a teammate who faces a similar event.

The role of automation: Introducing a level of automation into notebooks eases the burden of documenting and publishing results from computational experiments— freeing up researchers to focus on more substantive tasks. It provides user interfaces to annotate the captured provenance with notes and then make queries. In TechOps, when automation and machine learning are applied, it can help identify patterns and look for potential problems before they happen by exploring the relationships between seemingly unconnected things.

How Runbooks Have Been Adapted for TechOps

The use of notebooks in the data science domain provides a rich pool of experience from which TechOps can benefit. In both environments, the primary cohorts, whether researchers and data scientists or engineers and technicians, lean into knowledge-sharing, collaboration, and interactive documentation principles. Notebooks and runbooks differ in the runbook’s ability to be productized, so they are repeatable, understandable, auditable, and transparent.

Repeatable

Notebooks and runbooks both combine documentation with code in a seamless way. However, notebooks are rooted in academia – they blend documentation with instructions for people to follow and explore. What’s fundamentally different is that they are not repeatable.

With the notebook, there is no visibility into the state behind the notebook. Ad hoc, someone can add a line that sets a variable, making the notebook behave differently going forward. While this attribute is essential for rapid iteration in a classroom setting, TechOps teams need repeatable tasks that enable consistency and accuracy — whether in a daily task or on-call after hours.

Understandable

Unlike notebooks, runbooks allow versioning, which enables users to have a fixed, understandable set of steps that are ready to go. As a result, users can choose which version to deploy and use in production. Once that runbook is deployed, it cannot be changed underneath; the state is understood.

Auditable

While notebooks are not designed for auditability, TechOps runbooks provide a full audit trail through automatic documentation of human and machine actions. This audit trail clearly identifies automated actions and human actions tied to a person’s identity. If any of the steps followed has an error, the error condition is detectable and documented.

Transparent

Data science notebooks are designed to be adaptable, as a starting point for doing work that others have done before but that may be done in a different way each time. Runbooks are intentionally designed to allow the predictable execution of a well-defined process that is repeatable, understandable, and auditable as they unfold. Runbooks include the notion of action chains, which are menus of automation that can run based on the user's assessment of what must be done. For example, a user can have multiple buttons that run various automated action chains of responses. However, the only way those action chains can cross-communicate is by outputting something visible to the user or to a system of record such as writing an update message to the ticket so that what happened is always visible, transparent, and audited. Future parts of the runbook can use the information in the ticket to know what happened.

Runbooks Move to a New Level

Runbooks take the best attributes of notebooks — rapid iteration and in-line coding — and make them "productionizable" by leaning into the capabilities needed in complex operational environments. Although data science notebooks allow users to have automation, it is not necessarily repeatable, understandable, auditable, or transparent.

Critical runbook capabilities include versioning, change control, automatic documentation of events, capacity to support repeatable and understandable sets of actions and action chains that report and enter data into other systems about events. By combining the best qualities of the notebooks with the new attributes of a runbook, you end up with a runbook that is ideally suited for complex operational environments.

How Technical Operations Can Build on the Success of Data Science Notebooks

How Technical Operations Can Build on the Success of Data Science Notebooks

This article discusses data science notebooks, a popular document format for publishing code, results, and more. Find out more below!

How Notebooks and Runbooks Are Similar

Ways Notebook Capabilities Come to Life in TechOps

How Runbooks Have Been Adapted for TechOps

Repeatable

Understandable

Auditable

Transparent

Runbooks Move to a New Level

Recommend

Constant Constants. Finally! (On the inconstancy of constants) | Mark Baker's Bl...

Otomi: OSS Developer Self-Service for Kubernetes

A bad Cassandra node warned us that it would fail, but did we listen?

Architecture Evolution for Interactive Queries

$17 SBC runs Linux on Allwinner D1 RISC-V SoC

What Capacity Check?

Thinking of Models as Graphs

The 3-2-1 Backup Rule – Why Your Data Will Always Survive

.NET知识梳理——8.AOP

Verizon and AT&T offer to temporarily lower 5G’s power to avoid aircraft int...

About Joyk