Insight

A lightweight, industrial-grade tensor computation framework for signal processing and GPU acceleration.

Insight is a C++ tensor library inspired by PaddlePaddle (operator registration, device HAL), Torch7 (clean C API, TH/THC spirit), and NumPy/CuPy (Python-side API conventions). It delivers CPU/GPU unified computing with zero-copy views, dynamic operator dispatch, and full signal processing support.

Features

Unified API -- insight::Array works seamlessly on CPUPlace and GPUPlace
Zero-Copy Views -- reshape, transpose, slice via strides & offset
Dynamic Operator Registry -- ops()["add"][CPU][F32](args) dispatch (Paddle style)
Device HAL -- ABI-stable plugin system via Device base class + extern "C" factory
Signal Processing -- 89 functions across 14 submodules (windows, waveforms, B-splines, filter design, convolution, filtering, spectral analysis, wavelets, acoustics, radar, demod, peak finding, estimation, I/O), all with CPU and CUDA backend kernels
Half-Precision -- fp16/bf16 support via half_utils.h/half_utils.cuh, 116 kernel files with half-precision coverage
Language Bindings -- Python (pybind11), Lua (sol2), Julia (ccall) with per-module wrappers and signal sub-namespaces
Modern C++ -- C++17/20, OpenMP parallel, FFTW3, OpenBLAS, cuBLAS, cuFFT
No Autograd -- Keeps the library lightweight and focused

Architecture

insight/
├── include/insight/
│   ├── core/           # Array, Shape, Strides, DType, Place
│   ├── ops/            # Frontend API (elementwise, fft, signal, linalg, etc.)
│   ├── io/             # I/O (csv, print, sndfile)
│   ├── c_api/          # C ABI interfaces (array, kernel, dtype, place)
│   └── plugin/         # Operator registry + device HAL
├── src/
│   ├── core/           # Array implementation, memory management
│   ├── ops/            # Frontend operator logic
│   └── internal/       # Internal utilities
├── backends/
│   ├── cpu/kernels/    # CPU kernels (OpenMP + FFTW + OpenBLAS)
│   └── cuda/kernels/   # CUDA kernels (cuBLAS + cuFFT + Thrust)
├── bindings/
│   ├── python/insight/ # pybind11 bindings (per-module wrappers)
│   ├── lua/insight/    # sol2 bindings (dual calling convention)
│   └── julia/          # ccall bindings (Insight.jl)
├── tests/
│   ├── cpu/            # CPU tests (630+ tests, 27 suites)
│   ├── cuda/           # CUDA tests (510+ tests, 23 suites)
│   └── python_align/   # NumPy precision alignment tests
└── demos/              # Example programs (C++, Python, Lua, Julia)

Quick Start

Build from Source

Linux / macOS:

git clone https://github.com/PlumBlossomMaid/Insight7.git
cd Insight7
mkdir build && cd build
cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DINSIGHT_WITH_CUDA=ON \
    -DINSIGHT_USE_FFTW3=ON \
    -DINSIGHT_USE_OPENBLAS=ON
cmake --build . -j$(nproc)

Windows (MSVC):

# Prerequisites: Visual Studio 2022+ (C++ workload), CMake 3.15+, Ninja
# Install dependencies via vcpkg (recommended):
#   vcpkg install fftw3:x64-windows openblas:x64-windows
# Or download OpenBLAS from https://github.com/OpenMathLib/OpenBLAS/releases
#   and extract to e.g. C:\deps\OpenBLAS-0.3.33-x64

# Open VS Developer Command Prompt (x64)
# Adjust path to match your Visual Studio installation:
call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsall.bat" x64

git clone https://github.com/PlumBlossomMaid/Insight7.git
cd Insight7
cmake -S . -B build -G Ninja ^
    -DCMAKE_C_COMPILER=cl.exe ^
    -DCMAKE_CXX_COMPILER=cl.exe ^
    -DCMAKE_BUILD_TYPE=Release ^
    -DINSIGHT_WITH_CUDA=ON ^
    -DCMAKE_PREFIX_PATH="C:/deps/OpenBLAS-0.3.33-x64;E:/vcpkg/installed/x64-windows"
cmake --build build -j %NUMBER_OF_PROCESSORS%

Note: For plot functionality, install gnuplot and ensure it is in your PATH.

Install Language Bindings

Python (requires CMake build first):

pip install .

Lua (via luarocks, requires CMake build first):

# Lua 5.3
luarocks make bindings/lua/insight-1.0-1.rockspec LUA_DIR=/usr --local

# LuaJIT
luarocks make bindings/lua/insight-1.0-1.rockspec --local

Julia:

push!(LOAD_PATH, "/path/to/Insight7/bindings/julia")
using Insight

Examples

C++

#include "insight/insight.h"
using namespace insight;

int main() {
    // Create arrays (auto-selects GPU when available)
    Array a = ones({1000, 1000}, F32);
    Array b = randn({1000, 1000}, F32);

    // Matrix multiplication
    Array c = matmul(a, b);

    // NumPy-style partial indexing
    Array row = c.at({0});     // shape (1000,)
    Array val = c.at({0, 0});  // scalar

    // Signal processing
    Array w = signal::hann(256);
}

Python

import insight as ins

# Auto-selects GPU when available (PaddlePaddle behavior)
print(ins.get_device())  # GPUPlace(0)

a = ins.rand([1000, 1000])
b = ins.randn([1000, 1000])

# Operators: +, -, *, /, //, %, **, @
c = a @ b                # matrix multiplication
d = a ** 2               # elementwise power
e = a // 3.0             # floor division

# NumPy-style indexing
row = a[1]               # partial indexing → shape (1000,)
val = a[1, 2]            # scalar extraction
sub = a[1:, ::2]         # mixed slice indexing

# Signal processing
w = ins.signal.hann(256)
f, Pxx = ins.signal.welch(x, fs=1000)

Lua

local ins = require("insight")
-- Backend auto-detected, GPU selected when available

print(ins.get_device())       -- "cuda:0" or "cpu:0"
print(ins.gpu_version())      -- 11080 (CUDA 11.8)

local a = ins.rand({1000, 1000})
local b = ins.randn({1000, 1000})
local c = ins.matmul(a, b)

-- 1-based indexing (Lua convention)
local row = a[1]              -- partial indexing → shape (1000,)

-- Dual calling convention
local w = ins.signal.hann(256)
local w2 = ins.signal.hann{n=256}

Julia

using Insight

dt, id = Insight.get_device()  # (1, 0) for GPU

a = Insight.rand(Int64[1000, 1000], Insight.float32)
b = Insight.randn(Int64[1000, 1000], Insight.float32)
c = Insight.matmul(a, b)

# 1-based indexing (Julia convention)
row = a[1]                     # partial indexing → shape (1000,)
val = a[1, 2]                  # scalar extraction

GPU Benchmark (A800-SXM4-80GB)

Tested on Baidu AI Studio with 24-core CPU + NVIDIA A800-SXM4-80GB:

Test	CPU (24-core)	GPU (A800)	Speedup
add (20M elements)	226ms	601μs	376x
mul (20M elements)	229ms	609μs	376x
sin (20M elements)	278ms	771μs	361x
sum (20M elements)	26ms	8.8μs	2,962x
max (20M elements)	42ms	8.4μs	4,976x
matmul 256×256	19ms	38μs	503x
matmul 1024×1024	3.6s	110μs	32,348x
rfft2 512×512	4.5ms	1.2ms	3.7x
randn (20M)	766ms	82ms	9.4x
sort (2M)	206ms	187ms	1.1x

GPU excels at large-scale parallel operations. Small FFTs and SVD have kernel launch overhead that favors CPU. The framework automatically selects the optimal device.

Dependencies

Dependency	Version	Required	Notes
CMake	3.15+	Yes	Build system
C++17 compiler	--	Yes	GCC 9+, Clang 12+, MSVC 2019+
CUDA	11.7+	No	Optional GPU backend
OpenBLAS	any	No	CPU linear algebra
FFTW3	3.3+	No	CPU FFT
OpenMP	--	No	CPU multi-threading
Thrust	bundled	No	CUDA sorting/unique
GoogleTest	auto	--	Automatically fetched

Test Status

1140+ tests passing -- CPU (630+, 27 suites) and CUDA (510+, 23 suites), plus 384 precision alignment tests

Suite	CPU	CUDA	Notes
cast	9	9
complex	22	22
creation	27	27
csv	1	1
dtype	9	9	Shared
elementwise	28	28
fft	19	19
indexing	41	33
linalg	43	43	15 native CUDA + 13 C_FALLBACK
manipulation	42	42
operator	50	50
print	11	11
random	31	31
reduction	24	24
signal (core)	10	10	Composite ops
signal_windows	30	30
signal_waveforms	18	18
signal_bsplines	13	13
signal_filter_design	22	22
signal_convolution	21	17
signal_filtering	23	15
signal_spectral	11	--
signal_wavelets	13	--
signal_acoustics	9	--
signal_radar	7	--
signal_io	11	--
signal_peak_finding	3	--
signal_demod	1	--
signal_estimation	1	--
plot	13	--
unary	27	27
audio	9	9
Total	630+	510+

Language Binding Tests

Language	Test Framework	Tests
Python	pytest	76 smoke + 54 numerical + 384 alignment
Lua	busted	208
Julia	Test stdlib	74

Demos

Example programs in demos/ covering 4 languages and 6 scenarios:

Demo	C++	Python	Lua	Julia
basic_ops	✅	✅	✅	✅
fft_demo	✅	✅	✅	✅
gpu_transfer	✅	✅	✅	✅
linalg_demo	✅	✅	✅	✅
radar_task1	✅	✅	✅	✅
sndfile_demo	✅	✅	✅	✅

License

Apache License 2.0

Contributing

Issues and pull requests are welcome. Please ensure:

Code follows .clang-format style
All existing tests pass
New features include corresponding tests

Name		Name	Last commit message	Last commit date
Latest commit History 356 Commits
.claude/skills		.claude/skills
.github		.github
.qwen/skills		.qwen/skills
audio		audio
backends		backends
bindings		bindings
cmake		cmake
demos		demos
include/insight		include/insight
patches/matplotplusplus		patches/matplotplusplus
src		src
tests		tests
third_party		third_party
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
Doxyfile		Doxyfile
LICENSE		LICENSE
QWEN.md		QWEN.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
README.zh-TW.md		README.zh-TW.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Insight

Features

Architecture

Quick Start

Build from Source

Install Language Bindings

Examples

C++

Python

Lua

Julia

GPU Benchmark (A800-SXM4-80GB)

Dependencies

Test Status

Language Binding Tests

Demos

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Insight

Features

Architecture

Quick Start

Build from Source

Install Language Bindings

Examples

C++

Python

Lua

Julia

GPU Benchmark (A800-SXM4-80GB)

Dependencies

Test Status

Language Binding Tests

Demos

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages