Mechanical and Aerospace Engineering Faculty Research & Creative Works

Human Action Recognition by Discriminative Feature Pooling and Video Segmentation Attention Model

Md Moniruzzaman
Zhaozheng Yin
Zhihai Henry He
Ruwen Qin
Ming-Chuan Leu, Missouri University of Science and TechnologyFollow

Abstract

We introduce a simple yet effective network that embeds a novel Discriminative Feature Pooling (DFP) mechanism and a novel Video Segment Attention Model (VSAM), for video-based human action recognition from both trimmed and untrimmed videos. Our DFP module introduces an attentional pooling mechanism for 3D Convolutional Neural Networks that attentionally pools 3D convolutional feature maps to emphasize the most critical spatial, temporal, and channel-wise features related to the actions within a video segment, while our VSAM ensembles these most critical features from all video segments and learns (1) class-specific attention weights to classify the video segments into the corresponding action categories, and (2) class-agnostic attention weights to rank the video segments based on their relevance to the action class. Our action recognition network can be trained from both trimmed videos in a fully-supervised way and untrimmed videos in a weakly-supervised way. For untrimmed videos with weak labels, our network learns attention weights without the requirement of precise temporal annotations of action occurrences in videos. Evaluated on the untrimmed video datasets of THUMOS14 and ActivityNet1.2, and trimmed video datasets of HMDB51, UCF101, and HOLLYWOOD2, our network achieves superior performance, compared to the latest state-of-the-art methods

Recommended Citation

M. Moniruzzaman et al., "Human Action Recognition by Discriminative Feature Pooling and Video Segmentation Attention Model," IEEE Transactions on Multimedia, Institute of Electrical and Electronics Engineers (IEEE), Feb 2021.

The definitive version is available at https://doi.org/10.1109/TMM.2021.3058050

Department(s)

Mechanical and Aerospace Engineering

Research Center/Lab(s)

Intelligent Systems Center

Publication Status

Early Access

Comments

First Published 09 Feb 2021

This research work is supported by the National Science Foundation via CPS Synergy project CMMI-1646162 and National Robotics Initiative project NRI-1830479.

Keywords and Phrases

Action Recognition; Annotations; Attentional Pooling; Discriminative Features; Feature Extraction; Fullysupervised; Image Recognition; Task Analysis; Three-Dimensional Displays; Training; Two Dimensional Displays; Weakly-Supervised

International Standard Serial Number (ISSN)

1520-9210; 1941-0077

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

09 Feb 2021

Link to Full Text

COinS

Mechanical and Aerospace Engineering Faculty Research & Creative Works

Human Action Recognition by Discriminative Feature Pooling and Video Segmentation Attention Model

Abstract

Recommended Citation

Department(s)

Research Center/Lab(s)

Publication Status

Comments

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations

Mechanical and Aerospace Engineering Faculty Research & Creative Works

Human Action Recognition by Discriminative Feature Pooling and Video Segmentation Attention Model

Author

Abstract

Recommended Citation

Department(s)

Research Center/Lab(s)

Publication Status

Comments

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Share

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations