In this presentation, we will be demonstrating a Computer Vision demo using YOLOv5 on the American Sign Language Dataset including 26 classes.The model identifies signs in real time as well as with input image or audio and builds bounding boxes showing label with confidence value..The model is showcased using streamlit which can take input as an image.
The folder structure consists of four directories: BlackBox_P1, BlackBox_P2, GreyBox_P1, and GreyBox_P2. These directories contain scripts for training extracted models from victims named SwinT and MoViNet, under Black Box and Grey Box settings. Each directory has setup instructions in the corresponding "setup_readme" file. Additionally, there is an "Evaluation_All" folder with Jupyter Notebooks for evaluation, and details on the setup and run environment can be found in the "eval_readme" file. The training codes were executed on a server with an NVIDIA V100 GPU, while the evaluation codes were run on Google Colaboratory using a Pro Plus subscription.
This is a simple AI Chat Bot, build using rasa framework. Rasa provides flexible conversational AI for building text and voice-based assistants.
Image captioning is done using an attention-based encoder-decoder model with a pre-trained ResNet as the encoder and GRU as the decoder. Frames are extracted from videos, and captions are generated using the image captioning model, retaining only captions with low similarity scores to the previous caption. Video summarization is performed by extracting frames using OpenCV and using the T5 base Transformer model for abstractive summarization. The implementation includes dataset handling, vocabulary mapping, model architecture, training with teacher forcing, and evaluation using cosine similarity.
A simple Rock-Paper-Scissors game using CV in python For IITISOC-21
With CCTV images not being very clear on zooming there is a great demand for image denoising models. Build a model which takes input of noisy RGB images and outputs denoised images. Carefully study the kind of noise CCTV images have and target accordingly.
The backend program uses the 'face_recognition' and 'OpenCV' libraries to perform face recognition. It involves training the model with known faces, extracting features from bounding boxes, comparing features with a tolerance level for matching, and displaying the matched name. User information is stored in a 'face.db' database, while uploaded images are stored in a 'Faces' folder for easy access by OpenCV during deployment.In summary, the face recognition system combines a backend program that utilizes 'face_recognition' and 'OpenCV' for face recognition tasks. The system involves training the model, comparing features, displaying results, and storing user information and images separately for efficient processing.