Bottleneck ResNet16-2Plus1DD

BResNet16-2Plus1DD is a custom 2Plus1D(3D) deep learning architecture inspired by ResNet but designed with efficiency in mind. Unlike conventional ResNet models, which use basic residual layers (for ResNet-18 and ResNet-34) and bottleneck residual layers (for ResNet-50 and above), BResNet16 is optimized for lightweight performance, making it ideal for edge devices and performance-critical applications.

2Plus1D processes spatial and temporal dimensions separately using two consecutive convolutional layers, which are then concatenated. This method enables efficient handling of high-dimensional data while keeping computational costs relatively low. It was introduced in "A Closer Look at Spatiotemporal Convolutions for Action Recognition".

When to Use 2+1D Convolutions?

They excel in video analysis (action recognition, motion detection) where spatial and temporal features are naturally separable. For comparison:

3D Convolutions: Better for dense spatiotemporal correlations (e.g., fluid dynamics).
2+1D Convolutions: Optimal for balancing efficiency and performance in most video tasks.

Architectural Design

In traditional ResNet architectures:

Basic residual layers stack two convolutional layers on the main path and one convolutional layer on the shortcut path.
Bottleneck residual layers stack three convolutional layers on the main path, with the first and last layers being 1x1 convolutions (bottleneck layers) to reduce computation.

Why "16"?

A conventional ResNet model has an input stem, four stages, and an output layer. Each stage typically contains at least two residual blocks, making it impossible to create standard 18 and 34 variants using only bottleneck layers. The closest possible variant is 16, hence the name BResNet16 (Bottleneck Residual Network 16).

Key Modifications

To maintain efficiency while preserving the essential structure of ResNet, each stage in BResNet16 contains only a single Bottleneck Residual Block instead of the usual two. The stages are defined as follows:

# Backbone
self.block = BottleneckResidual2Plus1DD(filters=64, strides=(1, 1, 1))
self.block1 = BottleneckResidual2Plus1DD(filters=128, strides=(1, 2, 2))
self.block2 = BottleneckResidual2Plus1DD(filters=256, strides=(1, 2, 2))
self.block3 = BottleneckResidual2Plus1DD(filters=512, strides=(1, 2, 2))

Enhancements

BResNet16 incorporates improvements from the paper "Bag of Tricks for Image Classification with Convolutional Neural Networks" alongside additional optimizations to enhance efficiency and performance.

This repository also includes implementations of the Hardswish and Mish activation functions:

The codebase is fully integratable inside the TensorFlow and Keras code pipelines.

Key Enhancements

Modified Stem: Utilizes three convolutional layers instead of a single one.
ResNet-B Inspired Strides: Moved the stride placement in the residual blocks from the first convolution to the second.
ResNet-D Inspired Shortcut: Introduces an average pooling layer before the 1x1 convolution in the shortcut connection.
Reduced Downsampling: The temporal dimension is now downsampled only twice in the stem block, while the spatial dimension follows the original approach, undergoing downsampling five times.
Modified Channel Count: The number of channels has been adjusted to better maintain a compact model size. Specifically, the filter count in the first two layers in the main path is reduced by a factor of 4, creating a squeeze-and-expansion effect (the final output channel count remains scaled by a factor of 4).

Note: The image above represenst the architectural modifications. It depicts 2D convolutional layers, whereas this project is focused on 2Plus1D(3D) convolutions. The image is sourced from the referenced paper.

Installation & Usage

This code is compatible with Python 3.12.8 and TensorFlow 2.18.0.

from BResNet162Plus1DD import BResNet162Plus1DD


model = BResNet162Plus1DD()
model.build((None, 32, 256, 256, 3))
model.summary()

Model Summary Example

Model: "b_res_net162_plus1dd"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ conv2_plus1d_layer                   │ (None, 16, 128, 128, 32)    │           2,706 │
│ (Conv2Plus1DLayer)                   │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2_plus1d_layer_1                 │ (None, 16, 128, 128, 32)    │          27,648 │
│ (Conv2Plus1DLayer)                   │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2_plus1d_layer_2                 │ (None, 16, 128, 128, 64)    │          55,680 │
│ (Conv2Plus1DLayer)                   │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling3d (MaxPooling3D)         │ (None, 8, 64, 64, 64)       │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ bottleneck_residual2_plus1dd         │ (None, 8, 64, 64, 256)      │          28,944 │
│ (BottleneckResidual2Plus1DD)         │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ bottleneck_residual2_plus1dd_1       │ (None, 8, 32, 32, 512)      │         184,192 │
│ (BottleneckResidual2Plus1DD)         │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ bottleneck_residual2_plus1dd_2       │ (None, 8, 16, 16, 1024)     │         735,104 │
│ (BottleneckResidual2Plus1DD)         │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ bottleneck_residual2_plus1dd_3       │ (None, 8, 8, 8, 2048)       │       2,935,168 │
│ (BottleneckResidual2Plus1DD)         │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ global_average_pooling3d             │ (None, 2048)                │               0 │
│ (GlobalAveragePooling3D)             │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense (Dense)                        │ (None, 256)                 │         524,544 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 4,493,986 (17.14 MB)
 Trainable params: 4,493,986 (17.14 MB)
 Non-trainable params: 0 (0.00 B)

License

This work is under an MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
BResNet162Plus1DD		BResNet162Plus1DD
tests		tests
util_resources/readme		util_resources/readme
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
playground.py		playground.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bottleneck ResNet16-2Plus1DD

Architectural Design

Why "16"?

Key Modifications

Enhancements

Key Enhancements

Installation & Usage

Model Summary Example

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bottleneck ResNet16-2Plus1DD

Architectural Design

Why "16"?

Key Modifications

Enhancements

Key Enhancements

Installation & Usage

Model Summary Example

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages