Contribute
NOTE: The models
segment of the repository is highly underdeveloped from an implementation standpoint, and has to be worked upon. A step-by-step guide to do the same is present in the third section of the page.
Repsitory Structure
The contributor can navigate to the models directory of the main tree in order to find all files related to the Machine Learning Model.
-
models/TrainData_preprocess.py
contains the script to resize the training images, implement mediapipe detection on the image in order to detect hand landmarks, and hence draw the landmarks on the image. The data, that is the vectorized information of hand landmarks along with the labels, that is the alphabet the vector corresponds to, is saved in two separate lists, post which these two lists are dumped in a pickle file as a dictionary with the format{'data': data, 'labels': labels}
, whereindata
corresponds to the list of landmark vectors, andlabels
corresponds to the list of alphabets, with the indexes synchronized. -
models/RandomForest.py
trains the random forest model on the data processed inmodels/TrainData_preprocess.py
, and saves the same in a pickle file namedmodel.p
-
models/main.py
performs inference based on the model laid down in the previous two scripts. The input is taken in the form of a video feed from the webcam of the human (alternatively, non-human) running the script, and the prediction is displayed visually in a CV2 window.
NOTE: Training data and model.p
are not present in the repository. Training data is the train split of Standard English ASL Alphabet Dataset (Save this in './ASL_Alphabet_Dataset/asl_alphabet_train'
, or alternatively change the path for local testing purposes), while the model file must be generated by running models/TrainData_preprocess.py
followed by models/RandomForest.py
Current Repsitory Status
The files present in the repository are just a rough implementation of gesture detection and classification. Face recognition is yet to be added, both integrated with each other and with the input from the user (information coming from the frontend) in order to implement anti-spoofing.
To-Do
Short Term
- Modify and functionize
main.py
to take a single image as input and return class and confidence as the output. Let the function, for the purposes of this document, be calledclassifyHand
- Implement an inference script for face recognition, which contains a function that takes two images (ground truth and test image) as the input and returns a truth value. Let the function, for the purposes of this document, be called
faceRecog
- Integrate
faceRecog
andclassifyHand
to implement antispoofing functionality, as explained in the Theory page of the documentation. - Implement the functions as a Flask app (could also be a Django app for FastAPI API), and Dockerize the same.
Long Term
- Expand the dataset for hand signs in order to prevent spoofing in the worst-case scenario.
- Possibly implement a detection and classification algorithm for dynamic (sequencial, time-continuous) video gestures, for instance, wave of hand or snapping, possibly via an LSTM architecture, in order to expand the diversity of classes.