Ask HN: When will LLMs be able to interrupt or interject?

		Ask HN: When will LLMs be able to interrupt or interject?
	24 points by yehosef 1 hour ago \| hide \| past \| favorite \| 14 comments
	I'm curious what is needed for LLMs to interrupt (take control of the conversation) or interject (add some comment while the other is talking, but not to take control of the conversation).

There’s no reason this couldn’t be implemented now. The main barriers are inference speed and cost, since to implement this would require continuously running the LLM on all newly available text from the user and choosing quickly when to interject, and the difficulty of programming complex behaviour.

It’s already possible. I can’t find the thread now, but I saw a demo on X recently where they had an LLM hooked up to a text field where every character typed was sent to the LLM immediately so that it could anticipate responses and do some planning ahead of time. You’re basically talking about the same thing except for the fact that one of the possible outputs for the LLM is an interrupt function call.

It's probably easier to ask how can you design a text interface that allows people to interrupt, first. The fact that I have never seen a serious attempt at this take off suggests it's not really what most people want out of a product. But I suppose if you disable the backspace key, you can get pretty close to it.

I’ve just tried with ChatGPT on iOS, you can press stop then immediately respond while the AI is still generating the response.

It’s possible now, no idea why anyone would want this though. The idea is that you want something helpful, and you can do some additional prompting to encourage the model to ask questions but outright derailing the conversation is contrary to what these models are trying to do.

Is it actually possible now? In order to do this it would have to be reading your typing in real time and creating a relevant response and then decide (correctly) its response should be to interrupt or interject.

You are right that no one would want an existing LLM to do it because they are not capable of doing it correctly. The ones that are fast enough are far too stupid to do it correctly, and its not clear to me even if GPT 4 could be fast enough that it would do it correctly 8/10 times, which would be about the worst it could do before anyone would turn off that feature.

It would be implemented like auto-completion. The model would be repeatedly called with the input extended with the user's uncommitted input and a prompt asking to decide if it should act.

Nothing to stop them doing that now barring UX choices.

Interjecting requires planning ahead.

The way a human interjects is that you have a parallel thought chain going, along with the conversation, as it's happening in real time. In this parallel chain, you are planning ahead. What point am I going to make once we are past this point of conversation? What is the implication of what is being discussed here? (You also are thinking about what the other person is thinking; you are developing a mental model of their thought process).

LLM does not have any of this, architecturally, it just has the text itself. Any planning that people are claiming to do with LLama et al is really just "pseudo" planning, not the fundamental planning we talk about here. I suspect it will be a while yet before we have "natural" interjection from LLM.

When it does come, however, it will be extremely exciting. Because it will mean that we have cracked planning and made the AI far more agentic than it is now. I would love to be proven wrong.

Wouldn’t this be bad for marketing reasons? If people see the LLM output just instantly changes with each word or character they type it would cease to appear as some kind of “intelligence” and just feel like nothing more than a glorified autosuggest? Tweak a few words here and there to try to modify the output in subtle ways?

It seems for people to perceive it as true AI they must send off some prompt, watch it think deep while a loader spins, and then read a response.

1. Continuously read user input.

2. Constantly predict a few tokens ahead.

3. When the predicted text includes the computer's prompt, respond with that, without waiting for the user to push enter.

Probably also

4. Stop engineering the initial instructions for such obsequious behavior.

“As a language model, I must tell you that what you’re referring to as Linux is actually called GNU/Linux, or as I call it…”

to interrupt would require interruptible conversation. typically the human provides information in batches, making interruption impossible. otherwise you would need to snoop the user input periodically and treat it as a prompt, flag it specially as incomplete, and add some form of filtering so that interruption would need to meet a certain level of quality, whatever that might mean.

to be useful, it would need something to interrupt, and instruction on what warrants an interruption.

Recommend

Scoring your project’s security

'Canonical Turns 20: Shaping the Ubuntu Linux World' - Slashdot

Snyk documentation: Our journey so far

Amazon-Backed Rivian Surges 13% After Announcing Cheaper New SUV - Slashdot

Akira Toriyama, Creator of Dragon Ball Manga Series, Dies Aged 68 - Slashdot

2024年电子商务全攻略：深度解析什么是电子商务

AI加持卫星探索空天地通信的无限可能

Sage Transform 2024 - ACT Construction shows why industries are the future for c...

【爆料汇总】等等党卫冕，大波骁龙7+ Gen 3/骁龙8s Gen 3新机来袭、华为P70系列与天玑...

Study Finds That We Could Lose Science If Publishers Go Bankrupt - Slashdot

About Joyk