Computer Vision & Image Processing I

C2L-C: Computer Vision & Image Processing I

Session Type: Lecture
Session Code: C2L-C
Location: Room 3
Date & Time: Friday March 24, 2023 (10:00-11:00)
Chair: Paris Giampouras
Track: 5
Paper IDPaper TitleAuthorsAbstract
3062Searching for the Most Probable Combination of Class Labels Using Etcetera AbductionAndrew Gordon, Andrew FengMany machine perception tasks require a trained model to assign class labels to multiple entities in the same context, e.g., labeling multiple objects in a single photograph. In these tasks, different combinations of labels may be more likely than others, e.g., when co-occurrence biases are considered, such that the most-confident label assigned to an individual object is not always the best choice. In this paper, we propose a new method for combining evidence from multiple class probability distributions to identify the most probable combination of labels in multi-entity contexts. Our method encodes discrete class probability distributions as literals in first-order logic, and uses probability-ranked logical abduction to identify the most likely label combination, incorporating the prior and conditional probabilities of each label. We evaluate our method on two computer vision benchmarks, first for labeling common objects in photographs of everyday contexts, and second for labeling actions of athletes in sports videos. Results indicate significant gains in classifier accuracy over systems that merely select the model\'s most confident class label.
3140Real-Time Fitness Activity Recognition and Correction Using Deep Neural NetworksMichelle Mary Varghese, Sahana Ramesh, Sonali Kadham, Dhruthi V M, Preet KanwalFitness activities are beneficial to one’s health and well-being. During the Covid-19 pandemic, demand for virtual trainers increased. There are current systems that can classify different exercises, and there are other systems that provide feedback on a specific exercise. We propose a system that can simultaneously recognize a pose as well as provide real-time corrective feedback on the performed exercise with the least latency between recognition and correction. In all computer vision techniques implemented so far, occlusion and a lack of labeled data are the most significant problems in correctly detecting and providing helpful feedback. Vector geometry is employed to calculate the angles between key points detected on the body to provide the user with corrective feedback and count the repetitions of each exercise. Three different architectures - GAN, Conv-LSTM, and LSTM-RNN are experimented with, for exercise recognition. A custom dataset of Jumping Jacks, Squats, and Lunges is used to train the models. GAN achieved a 92% testing accuracy but struggled in real-time performance. The LSTM-RNN architecture yielded a 95% testing accuracy and ConvLSTM obtained an accuracy of 97% on real-time sequences.
3179Triplet Loss-Less Center Loss Sampling Strategies in Facial Expression Recognition ScenariosHossein Rajoli{1}, Fatemeh Lotfi{1}, Adham Atyabi{2}, Fatemeh Afghah{1}Facial expressions convey massive information and play a crucial role in emotional expression. Deep neural network (DNN) accompanied by deep metric learning (DML) techniques boost the discriminative ability of the model in facial expression recognition (FER) applications. DNN, equipped with only classification loss functions such as Cross-Entropy cannot compact intra-class feature variation or separate inter-class feature distance as well as when it gets fortified by a DML supporting loss item. The triplet center loss (TCL) function is applied on all dimensions of the sample\'s embedding in the embedding space. In our work, we developed three strategies: fully-synthesized, semi-synthesized, and prediction-based negative sample selection strategies. To achieve better results, we introduce a selective attention module that provides a combination of pixel-wise and element-wise attention coefficients using high-semantic deep features of input samples. We evaluated the proposed method on the RAF-DB, a highly imbalanced dataset. The experimental results reveal significant improvements in comparison to the baseline for all three negative sample selection strategies.