pffft - OpenGrok cross reference for /external/pffft/

# PFFFT: a pretty fast FFT and fast convolution with PFFASTCONV

## TL;DR

PFFFT does 1D Fast Fourier Transforms, of single precision real and
complex vectors. It tries do it fast, it tries to be correct, and it
tries to be small. Computations do take advantage of SSE1 instructions
on x86 cpus, Altivec on powerpc cpus, and NEON on ARM cpus. The
license is BSD-like.


PFFASTCONV does fast convolution (FIR filtering), of single precision
real vectors, utilizing the PFFFT library. The license is BSD-like.

PFDSP contains a few other signal processing functions.
Currently, mixing and carrier generation functions are contained.
It is work in progress - also the API!
The fast convolution from PFFASTCONV might get merged into PFDSP.


## Why does it exist:

I was in search of a good performing FFT library , preferably very
small and with a very liberal license.

When one says "fft library", FFTW ("Fastest Fourier Transform in the
West") is probably the first name that comes to mind -- I guess that
99% of open-source projects that need a FFT do use FFTW, and are happy
with it. However, it is quite a large library , which does everything
fft related (2d transforms, 3d transforms, other transformations such
as discrete cosine , or fast hartley). And it is licensed under the
GNU GPL , which means that it cannot be used in non open-source
products.

An alternative to FFTW that is really small, is the venerable FFTPACK
v4, which is available on NETLIB. A more recent version (v5) exists,
but it is larger as it deals with multi-dimensional transforms. This
is a library that is written in FORTRAN 77, a language that is now
considered as a bit antiquated by many. FFTPACKv4 was written in 1985,
by Dr Paul Swarztrauber of NCAR, more than 25 years ago ! And despite
its age, benchmarks show it that it still a very good performing FFT
library, see for example the 1d single precision benchmarks
[here](http://www.fftw.org/speed/opteron-2.2GHz-32bit/). It is however not
competitive with the fastest ones, such as FFTW, Intel MKL, AMD ACML,
Apple vDSP. The reason for that is that those libraries do take
advantage of the SSE SIMD instructions available on Intel CPUs,
available since the days of the Pentium III. These instructions deal
with small vectors of 4 floats at a time, instead of a single float
for a traditionnal FPU, so when using these instructions one may expect
a 4-fold performance improvement.

The idea was to take this fortran fftpack v4 code, translate to C,
modify it to deal with those SSE instructions, and check that the
final performance is not completely ridiculous when compared to other
SIMD FFT libraries. Translation to C was performed with [f2c](
http://www.netlib.org/f2c/). The resulting file was a bit edited in
order to remove the thousands of gotos that were introduced by
f2c. You will find the fftpack.h and fftpack.c sources in the
repository, this a complete translation of [fftpack](
http://www.netlib.org/fftpack/), with the discrete cosine transform
and the test program. There is no license information in the netlib
repository, but it was confirmed to me by the fftpack v5 curators that
the [same terms do apply to fftpack v4]
(http://www.cisl.ucar.edu/css/software/fftpack5/ftpk.html). This is a
"BSD-like" license, it is compatible with proprietary projects.

Adapting fftpack to deal with the SIMD 4-element vectors instead of
scalar single precision numbers was more complex than I originally
thought, especially with the real transforms, and I ended up writing
more code than I planned..


## The code:

### Good old C:
The FFT API is very very simple, just make sure that you read the comments in `pffft.h`.

The Fast convolution's API is also very simple, just make sure that you read the comments
in `pffastconv.h`.

### C++:
A simple C++ wrapper is available in `pffft.hpp`.


### Git:
This archive's source can be downloaded with git including the submodules:
```
git clone --recursive https://github.com/hayguen/pffft.git
```

With `--recursive` the submodules for Green and Kiss-FFT are also fetched,
to use them in the benchmark. You can omit the `--recursive`-option.

For retrieving the submodules later:
```
git submodule update --init
```


## CMake:
There's now CMake support to build the static libraries `libPFFFT.a`
and `libPFFASTCONV.a` from the source files, plus the additional
`libFFTPACK.a` library. Later one's sources are there anyway for the benchmark.


## Origin:
Origin for this code is Julien Pommier's pffft on bitbucket:
[https://bitbucket.org/jpommier/pffft/](https://bitbucket.org/jpommier/pffft/)


## Comparison with other FFTs:

The idea was not to break speed records, but to get a decently fast
fft that is at least 50% as fast as the fastest FFT -- especially on
slowest computers . I'm more focused on getting the best performance
on slow cpus (Atom, Intel Core 1, old Athlons, ARM Cortex-A9...), than
on getting top performance on today fastest cpus.

It can be used in a real-time context as the fft functions do not
perform any memory allocation -- that is why they accept a 'work'
array in their arguments.

It is also a bit focused on performing 1D convolutions, that is why it
provides "unordered" FFTs , and a fourier domain convolution
operation.

Very interesting is [https://www.nayuki.io/page/free-small-fft-in-multiple-languages](https://www.nayuki.io/page/free-small-fft-in-multiple-languages).
It shows how small an FFT can be - including the Bluestein algorithm, but it's everything else than fast.
The whole C++ implementation file is 161 lines, including the Copyright header, see
[https://github.com/nayuki/Nayuki-web-published-code/blob/master/free-small-fft-in-multiple-languages/FftComplex.cpp](https://github.com/nayuki/Nayuki-web-published-code/blob/master/free-small-fft-in-multiple-languages/FftComplex.cpp)

## Dependencies / Required Linux packages

On Debian/Ubuntu Linux following packages should be installed:

```
sudo apt-get install build-essential gcc g++ cmake
```

for benchmarking, you should have additional packages:
```
sudo apt-get install libfftw3-dev gnuplot
```

run the benchmarks with `./bench_all.sh ON` , to include benchmarks of fftw3 ..
more details in README of [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks)


## Benchmark results

The benchmark results are stored in a separate git-repository:
See [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks).

This is to keep the sources small.
Name		Date	Size	#Lines	LOC
..		-	-
simd/		23-Nov-2023	-	1,551	833
.gitignore	D	23-Nov-2023	6	2	1
.gitmodules	D	23-Nov-2023	264	10	9
Android.bp	D	23-Nov-2023	1.7 KiB	59	56
CMakeLists.txt	D	23-Nov-2023	12.3 KiB	368	295
LICENSE.txt	D	23-Nov-2023	1.6 KiB	39	30
METADATA	D	23-Nov-2023	495	20	19
MODULE_LICENSE_BSD	D	23-Nov-2023	0
README.md	D	23-Nov-2023	6.4 KiB	156	114
bench_all.sh	D	23-Nov-2023	1.7 KiB	82	53
bench_mixers.c	D	23-Nov-2023	23.2 KiB	731	567
bench_pffft.c	D	23-Nov-2023	40 KiB	1,296	1,080
fftpack.c	D	23-Nov-2023	86.2 KiB	3,123	2,556
fftpack.h	D	23-Nov-2023	26.4 KiB	800	32
fmv.h	D	23-Nov-2023	417	21	15
pf_carrier.cpp	D	23-Nov-2023	8 KiB	299	188
pf_carrier.h	D	23-Nov-2023	2.7 KiB	76	22
pf_cic.cpp	D	23-Nov-2023	9 KiB	253	168
pf_cic.h	D	23-Nov-2023	2.3 KiB	59	14
pf_mixer.cpp	D	23-Nov-2023	41.7 KiB	1,149	836
pf_mixer.h	D	23-Nov-2023	9.6 KiB	282	116
pffastconv.c	D	23-Nov-2023	7.7 KiB	265	204
pffastconv.h	D	23-Nov-2023	5.9 KiB	172	27
pffft.c	D	23-Nov-2023	4.7 KiB	132	50
pffft.h	D	23-Nov-2023	8 KiB	217	30
pffft.hpp	D	23-Nov-2023	28.5 KiB	1,002	617
pffft_common.c	D	23-Nov-2023	2.2 KiB	69	47
pffft_double.c	D	23-Nov-2023	5.1 KiB	143	57
pffft_double.h	D	23-Nov-2023	8.3 KiB	222	30
pffft_priv_impl.h	D	23-Nov-2023	74.8 KiB	2,192	1,821
plots.sh	D	23-Nov-2023	1.2 KiB	51	39
sse2neon.h	D	23-Nov-2023	223.3 KiB	5,957	3,168
test_pffastconv.c	D	23-Nov-2023	28.3 KiB	916	799
test_pffft.c	D	23-Nov-2023	11.4 KiB	372	274
test_pffft.cpp	D	23-Nov-2023	11.5 KiB	378	280
use_gcc8.inc	D	23-Nov-2023	74	3	2