Tutorial: A simple O(n log n) polynomial multiplication algorithm

Hi Codeforces!

I have something exciting to tell you guys about today! I have recently come up with a really neat and simple recursive algorithm for multiplying polynomials in time. It is so neat and simple that I think it might possibly revolutionize the way that fast polynomial multiplication is taught and coded. You don't need to know anything about FFT to understand and implement this algorithm.

Big thanks to nor, c1729 and Spheniscine for discussing the contents of the blog with me and comming up with ideas for how to improve the blog =).

I've split this blog up into two parts. The first part is intended for anyone to be able to read and understand. The second part is advanced and goes into a ton of interesting ideas and concepts related to this algorithm.

Prerequisite: Polynomial quotient and remainder, see Wiki article and this Stackexchange example.

Task:

Given two polynomials and , an integer and a non-zero complex number , where degree and degree . Your task is to calculate the polynomial in time. You may assume that is a power of two.

Solution:

We can create a divide and conquer algorithm for based on the difference of squares formula. Assuming is even, then . The idea behind the algorithm is to calculate and using 2 recursive calls, and then use that result to calculate .

So how do we actually calculate using and ?

Well, we can use the following formula:

Proof of the formula

This formula is very useful. If we substitute by , then the formula tells us how to calculate using and in linear time. With this we have the recipie for implementing a divide and conquer algorithm:

Input:

Integer (power of 2),
Non-zero complex number ,
Two polynomials and .

Output:

The polynomial .

Algorithm:

Step 1. (Base case) If , then return . Otherwise:

Step 2. Starting from and , in time calculate

Step 3. Make two recursive calls to calculate and .

Step 4. Using the formula, calculate in time. Return the result.

Here is a Python implementation following this recipie:

Python solution to the task

One final thing that I want to mention before going into the advanced section is that this algorithm can also be used to do fast unmodded polynomial multiplication, i.e. given polynomials and calculate . The trick is simply to pick large enough such that , and then use the exact same algorithm as before. can be arbitrarily picked (any non-zero complex number works).

Python implementation for general Fast polynomial multiplication

If you want to try out implementing this algorithm yourself, then here is a very simple problem to test out your implementation on: SPOJ:POLYMUL.

(Advanced) Speeding up the algorithm

This section will be about tricks that can be used to speed up the algorithm. The first two tricks will speed up the algorithm by a factor of 2 each. The last trick is advanced, and it has the potential to both speed up the algorithm and also make it more numerically stable.

$n$ doesn't actually need to be a power of 2

Imaginary-cyclic convolution

Calculating fast_polymult_mod(P, Q, n, c) using using fast_polymult_mod(P, Q, n, 1) (reweight technique)

(Advanced) -is-this-fft-?

This algorithm is actually FFT in disguise. But it is also different compared to any other FFT algorithm that I've seen in the past (for example the Cooley–Tukey FFT algorithm).

Using this algorithm to calculate FFT

This algorithm is not the same algorithm as Cooley–Tukey

FFT implementation in Python based on this algorithm

FFT implementation in C++ based on this algorithm

(Advanced) Connection between this algorithm and NTT

Just like how there is FFT and NTT, there are two variants of this algorithm too. One using complex floating point numbers, and the other using modulo a prime (or more generally modulo an odd composite number).

Using modulo integers instead of complex numbers

Calculating fast_polymult_mod(P, Q, n, c) using fast_polymult_mod(P, Q, 2*n, 1)

This algorithm works to some degree even for bad NTT primes

NTT implementation in Python based on this algorithm

NTT implementation in C++ based on this algorithm

Blazingly fast NTT C++ implementation

(Advanced) Shorter implementations ("Codegolfed version")

It is possible to make really short but slightly less natural implementations of this algorithm. Originally I was thinking of using this shorter version in the blog, but in the end I didn't do it. So here they are. If you want a short implemention of this algorithm to use in practice, then I would recommend taking one of these implementations and porting it to C++.

Short Python implementation without any speedup tricks

Short Python implementation supporting odd and even $n$ (making it up to 2 times faster)

Short Python implementation supporting odd and even $n$ and imaginary cyclic convolution (making it up to 4 times faster)

Task:

Solution:

(Advanced) Speeding up the algorithm

(Advanced) -is-this-fft-?

(Advanced) Connection between this algorithm and NTT

(Advanced) Shorter implementations ("Codegolfed version")

Recommend

金钱心理学-读书简记

华为Mate 60新外观曝光！巨大星环后摄 “灵动岛”好看吗？

上半年扫地机器人四大特征价值驱动明显

绿色数据中心如何帮助企业实现可持续发展目标

联想凌拓向行业纵深存储突破

Weak cryptographic parameters in LUKS1

Game On: More Netflix Gaming, Return to Monkey Island, Spaceplan, and Jelly Car...

西部数据推出全新SSD，专为创意工作流保驾护航

新国货美妆品牌花西子，抖音年销过亿的直播策略有哪些？

有道需要丁磊“窗口指导”

About Joyk