1.1 Background and Rationale
Undoubtedly, we all know that the manufacturing industries is one of the largest sectors in the world. The goods produced may be affected by various factors during the production stages which might cause various defects on the surface like rolled-in scale (RS), patches (Pa), crazing (Cr), pitted surface (PS), inclusion (In) and scratches (Sc) which results in a huge loss to the company. In order to overcome this challenge a method called real-time detection of anomalies has been evolved with many advanced technologies that help us to identify the various defects during manufacturing and quality inspection which can identify the causation for the defects and work on the remedies.
Research plays a significant role in gathering information and this always helps to understand the better version of any concept. Research is done on what algorithms to choose for implementing the object detection, instance segmentation and localization of the object. Few algorithms like YOLO and SSD has its significance in both terms of accuracy and performance. Our team will be proceeding with research throughout the project as there are many new aspects to know and work with being a naïve for these datasets and algorithms.
1.3 Project Objectives
• Explore, scrutinize, and understand the annotated image dataset which is unstructured and labeled by another research team that has domain expertise from NEU.
• Identify how YOLO/SSD algorithms are used to Implement the defect detection techniques which helps to identify the anomalies in the manufactured goods with computer vision.
• Train and test the models with the dataset chunk by chunk as processing large amounts of data at once may take time and are useful to determine the resources to decrease the latency.
• Stream the visualizations of the detection output to a web browser.
• Identify various testing methodologies that support the visibility of the acquired images clearly
• Create a model where it successfully detects the anomalies in the manufacturing goods with less latency and more throughput.
1.4 Problem Space
In the 21st century, to be on par with consumers exponential growth in number, over the years we have revolutionized manufacturing methods by leveraging advanced technology and machinery to produce goods and meet consumer needs.
Inspection in manufacturing is a process that involves the testing, gauging, measuring, and examination of material or specimen, with the express purpose of determining whether it is in proper condition. Typically, specified standards are set, against which the results of the inspection are compared to establish if the material being inspected can pass this stage of quality control (.E, 2017).
Unfortunately, in many industries, we still practice manual, labor-intensive, expensive, error-prone methods of ensuring the quality of the products manufactured in mass. It is very likely, that a human eye does not possess the ability to make precise measurements with respect to characteristics like surface roughness, size one large than the other. Moreover, with manual inspection of finished products the manufacturing process efficiency takes a hit since an average time for a human to completely inspect a product takes more than 2 minutes. The Project aims at automating the process of detecting defects thereby reducing the cost of investment and increasing the ROI; hence, delivering high quality of the steel produced by detecting the defects in real-time.
1.5 Primary User Story
As a Quality Assurance Inspector, I want to minimize the time cost spent for dimensional measurement and surface inspection of a finished product in the manufacturing process while maintaining the quality standards and having a high rate of defect detection accuracy to maximize the sense of reliability dependability of our products to consumers.
1.6 Solution Space
Our project is an attempt to leverage and possibly improvise YOLO/SSD machine learning algorithms, extract valuable information from a stream of pictures using advanced data science techniques and utilize deep learning-based computer vision to detect anomalies, check parts for many different defects (e.g., contamination, scratches, dents, or deformations caused by faults in production) and specifications (mainly dimensional abnormalities) (Hamfeld, 2016) in real-time on finished products to minimize the time cost spent for dimensional measurement and surface inspection while maintaining the quality standards by automating the inspection process in manufacturing.
1.7 Product Vision
Whether a company is producing automotive, medical, consumer, or virtually any other product, all companies have some type of quality inspection or gauging as part of their production process (Smith, 2009). For instance, the Automobile industry, they are the building blocks of the transportation system across the universe, starting from a bicycle to rockets that need to be highly reliable and precise in terms of geometric dimensioning to ensure the safety of our consumers. Real-Time Anomaly Detection techniques researched and built-in this project look like a promising solution to increase defect detection accuracy and identify the tiniest dimensional and surface flaws that cannot be spotted by the human naked eye. If you look at a car manufacturing process, there are multiple inspection stages in between the building process to ensure quality right from the frame of the car to when the car is colored at the finish, to make few-dimensional checks that can be made easier and quicker with our automated approach are the height of the car, ground clearance, correct alignment of parts, uniform color coat and much more important specifications.
2 Data Acquisition
Data plays a very important role in any analytics or experiment. It has the power to make or break things and keep everything at stake. Data we chose and have for this dataset is an image dataset which is a grey-scaled image dataset. This raw dataset is first unsupervised and to make it compatible to implement the algorithms for classification and object detection the images are labeled by the domain expertise and made it suitable for implementing the models.
The dataset image which is unsupervised and to make it a supervised dataset suitable for classification, a research team from NEU worked on annotating the images and they made a great job in labeling as it requires certain domain expertise on these manufacturing goods, etc. As data plays a very vital role in these kinds of modeling and this step of labeling is a very important one as we are first classifying the images based on this.
We have 2 subfolders in the dataset we have which is the original dataset and the annotated image which is in the XML format. The XML has some specifications and tags like annotations, width, and height of the image, coordinates and bounding box, etc. This gives us more information on the dataset, images, and more.
2.2 Field Description
Understanding the data schema plays a significant role in understanding the data. This helps to identify the structure and schema. This helps us to identify the major attributes of the image like Height, width, depth, a label given to the image, and the bounding box coordinates.
This dataset has 6 classifiers where the entire dataset is having 6 labels and each image is labeled as one of these 6 labels, they are rolled-in-scale, patches, crazing, pitted surface, inclusion, scratches. This labeling is done by the domain expertise team as they need domain knowledge to identify and label them properly.
Annotation: This is the meta field comprising of many other fields which are stated below:
• Folder: This should contain the location where the dataset is saved
• Filename: The name of the file in the location.
• Source: It contains information regarding the origin of the information.
o Database: This field contains information about the database from where the image is extracted.
• Size: This specifies the size of the image:
o Width: Specifies the width of the image.
o Height: Specifies the height of the image.
o Depth: Specifies the depth of the image.
• Segmented: specifies how many segments we would like to form in the image.
• Object: This is also a meta field which represents the element node in the XML tree it has many fields within it which are:
o Name: Specifies the name of the object.
o Pose: This field specifies the coordinates of the type of the object to be detected.
o Truncated: This field specifies whether the object in the image is truncated or not.
o Difficult: This field states that there are objects which are annotated as difficult
• Bounding box: This field determines the coordinates of the bounding box
o Xmin: Specified the lower-left corner x coordinate dimensions of the bounding box.
o Ymin: Specifies the lower-left corner y coordinate dimensions of the bounding
o Xmax: Specifies the upper right corner x coordinate dimensions of the bounding
o Ymax: Specifies the upper right corner y coordinate dimensions of the bounding box.
2.3 Data Context
The fields specified above form the structure of the data schema and the annotated images are with XML extension and the tags within the structure of the image have different contexts and relate to different nodes like different elements of the same tree. All the above fields together comprise the meaning and provide better context together than compared to when it was like bits and pieces.
2.4 Data Conditioning
Data is properly conditioned with the gray-scaled image view and by accurately annotating every image with the labels and making it a supervised dataset to implement the classification and other object detection algorithms.
2.5 Data Quality Assessment
• Completeness: The data we had is complete as there are no missing or null values and we aren’t dealing with the structured data but with the unstructured data. As all images are labeled there are no such images that are left without labeling.
• Uniqueness: It is difficult to check the uniqueness of the data in the image dataset as the data is in the form of images and we identify through annotations and there are chances that they can be the same or not. Probably we should do it through deep fakes which is another level of complexity.
• Accuracy: The dataset is well annotated by domain experts from NEU, China.
• Atomicity: As the data is unstructured atomicity does not affect our use case.
• Conformity: The quality of conformity is also achieved where the data gathered is meeting the standards like format, size, and type, etc.
• Overall quality: The dataset is biased, i.e. certain kinds of defects are more in number than the others.
2.6 Other Data Sources
The other data sources referred for the implementation was a Kaggle dataset just to understand the structure and what it looks like and the images were raw and unsupervised whereas the dataset from the NEU database has the images annotated and a great dataset to work with the models like classification and object detection techniques.
3 Analytics and Algorithms
Choosing the best algorithm to Implement the defect detection techniques which help to identify the anomalies in the manufactured goods with computer vision is challenging and we need to have a clear understanding of algorithm and its architecture. After continuous research and understanding of the dataset we have which is unstructured and labeled by another research team who are domain expertise from NEU, we choose the fast object detection model called YOLO algorithm which is then used to train the model and implement the defect detection techniques to identify the anomalies.
3.1.1 YOLO (You Only Look Once)
The YOLO model was a solution to the lack of speed in the previously developed deep-learning object detection models. YOLO uses a One-stage-Detector strategy, i.e. it treats object detection as a regression problem, taken a given input image and simultaneously learning bounding box coordinates and corresponding class label probabilities. Unlike Other region proposed classification networks like fast-RCNN, which perform object detection on various region proposals and thus end up performing prediction multiple times for various images, YOLO architecture is more FCNN (Fully connected Neural Network) and passes the image once through the FCNN and the prediction is the output (Rosebrock, 2018)
22.214.171.124 Idea behind YOLO
• Divide the Image into multiple Grids
• We change the Label of our data such that we implement localization and classification algorithm for each grid cell.
• Construct one deep convolutional neural network with the loss function as the error between output activations and label vector I.e. the model predicts the output of all the grids in one forward pass of input image through ConvNet. The label for the object is determined by the object’s centroid in that grid, this prevents an object to be counted multiple times. (Chablani, 2017)
Note: YOLO uses features from an entire image to predict each bounding box and also all bounding boxes for all classes for an image simultaneously.
3.2 YOLO v3 Architecture:
The newer architecture boasts of residual skip connections, and upsampling. The most salient feature of v3 is that it makes detections at three different scales. YOLO is a fully convolutional network and its eventual output is generated by applying a 1 x 1 kernel on a feature map. In YOLO v3, the detection is done by applying 1 x 1 detection kernels on feature maps of three different sizes at three different places in the network. YOLO v3 makes predictions at three scales, which are precisely given by downsampling the dimensions of the input image by 32, 16, and 8 respectively. (Katuria, 2018)
Figure 3.2.1: YOLO V3 Architecture 
3.3 Train configuration
General train configuration available in model presets.
• lr - Learning rate.
• epochs - the count of training epochs.(No. of classes * 2000)
• batch_size - batch sizes for training (train) stage.
• input_size - input images dimension width and height in pixels(should be a multiple of 32)
• bn_momentum - batch normalization momentum parameter.
• gpu_devices - list of selected GPU devices indexes.
• data_workers - how many subprocesses to use for data loading.
• dataset_tags - mapping for split data to train (train) and validation (val) parts by images tags. Images must be tagged by train or val tags.
• subdivisions - split batch on subbatches (if big batch size does not fit to GPU memory).
• special_classes - objects with specified classes will be interpreted in a specific way. Default class name for background is bg, default class name for neutral is neutral. All pixels from neutral objects will be ignored in loss function.
• print_every_iter - allow to output training information every N iterations.
• weights_init_type - can be in one of 2 modes. In transfer_learning mode all possible weights will be transfered except last layer. In continue_training mode all weights will be transfered and validation for classes number and classes names order will be performed.
• enable_augmentations - current implementation contains strong augmentation system. If you want to use it select true or false otherwise.
3.4 Inputs for YOLO Algorithm:
YOLO V3 Algorithm takes the annotation files(.txt) as inputs only if they are in the format of [Class, X-Center, Y-Center, Width, and Height].
- Class: Class of defect present in the image
- X-Center: Mid-point of X-coordinates of bounding box with respect to image
- Y-Center: Mid-point of Y-coordinates of bounding box with respect to image
- Width: Width of the bounding box of the defect with respect to image
- Height: Height of the bounding box of the defect with respect to image
It takes the absolute path of the images which are in a single .txt file as the input where each line the .txt file represents the path to the image. We have also created a class.data file where the classes are mentioned in the file.
X_center =xmax+xmin/2 * 1/width
Y_center = ymax+ymin/2 * 1/height
Width = xmax-xmin * 1 width
Height = ymax- ymin * 1/height
Figure 3.4.1: Annotated file in XML format
The YOLO model takes inputs in a specific format where we need to store all files in the same directory i.e. the addresses of the folders and files are given a configuration file -input.cfg 1.images = consists of all images
2.labels = consists of all calculated annotation.txt files
3.train.txt and test.txt which consists of the absolute path to the images
4.class.data which includes the defect classes
Figure 3.4.2: XML type converted to .txt file
Figure 3.4.3: train.txt
Figure 3.4.4: test.txt
Figure 3.4.5: Class.data
Figure 3.4.6: Input.cfg
The input.cfg is a config file that holds the absolute paths of train.txt, test.txt, classes and backup
3.5 Training the model:
After creating all the above file, we need to create a directory structure as shown in the following figure. For the initial training purpose, we are using the pre-trained model weights of darkent53.conv.74 which is downloaded and stored in the current directory, which you can see in the figure below.
Figure 3.5.1: Directory structure
We have cloned the darknet repository of pjreddie into our co-lab environment and created a new folder called data which consists of sub-folders images, labels, and files created in the previous section 3.4 except input.cfg. The input.cfg created is loaded into google co-lab environment as shown in the figure. The sub-folder images consist of all the images both training and testing, and sub-folder labels consist of all the converted annotations files in .txt format. A backup folder yolov3_folder is created to store the model weights at regular intervals while training which acts as a checkpoint. If a training procedure is aborted due to any reason instead of training the model again from the scratch we can start from the previous checkpoint by loading in the weights stored in yolov3_folder.
We need to tweak two files Make and yolov3.cfg inside sub-folder cfg of the darknet. The changes that need to be made are shown in the figure below.
Figure 3.5.2: Changes made in Make file under darknet folder
Since we are using OpenCV and GPU for our training we need to enable them by replacing 0 with 1 inside the make file as shown in the figure above.
Figure 3.5.3: Changes made in yolov3.cfg file (changing the height and width according to our dataset)
Since our dataset consists of 200 * 200 resolution images and since we have 5 sub-sampling layers in the yolov3.cfg file the height and width must be multiples of 32. So, we chose 224 as our height and width for training. Also, we can change the number of iterations by tweaking the max batches in the file.
Figure 3.5.4: Changes in yolo layer of yolov3.cfg file
We have six classes in our dataset to train and predict on, so in the three Yolo layers in yolov3.cfg file we need to change classes to 6 and the filters in the above layer to 33 ((classes+5)*3). Also, last coordinates in the anchors exceed out height and width we have given in cfg file, so pick some random coordinate and replace the last anchor with it. Save these changes and upload the file to the co-lab environment in the location shown in figure fig: 3.5.1 directory structure.
By implementing the pre-processing steps on the raw like data like converting the annotated .xml files to .txt and then segregating them into the folders as per the input required for the model to run, all the data is acquired and then the model is trained with the 1260 grayscale images which have 6 defects which are termed as classes in our network.cfg file of our YOLO model. The model ran with all these data with the required parameters shown in figure 3.5.4 as input to the model.
Figure 3.5.5: Training data of 1260 images with individual defect count
Figure 3.5.6: Displaying the results of the model along with the attributes including the learning rate of the model.
When the model is trained with all these data with required parameters shown in the figure 3.5.4 as input to the model, the model didn’t retrieve the results as desired and then we kept tuning and tweaking the parameters like width, height to be 224, the max batches to be 12000, by calculating the new anchor boxes dimensions with respect to the image height and width then re-ran the model with the new parameters in the network.cfg file.
Figure 3.5.7: various configuration files and results obtained from them .
Fig 3.5.7 portrays the various configuration files and results obtained from them. The file we have used for detecting the defects from test images is config file yolov3-voc.cfg which achieved a loss of 0.18 and mAP=50.27.
Figure 3.5.7: Displaying the results of the best model after training
When tried testing the model with the test set images, we got a partial detection of the defects where defects are detected correctly and successfully while the other defects like crazing are not detected after the model is trained. The model could not detect most of the crazing defects and a few of the rolling scale defects. The model predicted all the defects and bounding boxes in JSON format(x_center, y_center, width, height)
Figure 3.5.8: Displaying the predictions of the model after testing on test images set
The model was performing well on defects like inclusion, scratches, patches, pitted surface but failed to perform with the same accuracy on defects -crazing and rolled-in scale. We suspect the model is unable to do so as YOLO downsample the images from ~600x600 to ~30x30 and due to this fact, the small object features they extract on the initial first layers just disappear and never actually get to the detection and classification steps. Defects like crazing and rolled-in scale are difficult to be detected from the surface and hence, our YOLO model even after multiple tweaks of the .cfg file has failed to achieve an industry level-deployable model.
Figure 126.96.36.199: Displaying the Actual images with bounding boxes and object class.
3.6.2 Predicted Images
Figure 3.6.2: Displaying the Predicted images with bounding boxes and object class.
The Model was successfully was able to predict the defects on test images with an accuracy of 100% for some defects but it continues to struggle to identify defects as crazing and rolled-in scale.
3.7 Predictions on the Video
The above figure 188.8.131.52 shows the test cases where different configuration files have been used to test the accuracy of the model in terms of detection and by using the default parameters of the Config file the results are better in terms of detecting the number of defects correctly than compared to the other files where parameters are tweaked differently unlike the default ones. The experiment continued with the default values of the config files and results are as follows.
After many tweaking, tuning of the parameters like height, width, anchor boxes, and batches, YOLO model was able to predict the defects but it has its little inefficiency when it comes to the detection of the defects which are minute for an instance, the defect crazing isn’t that well detected than compared to other defects like inclusion and patches. The reason behind this is that YOLO could not detect the defects which are of small in size.
Figure 3.7.1: Displaying a frame in a video where the test set images are merged (5 FPS)
Figure 3.7.2: Predictions on household utensils (Steel plate with scratches)
3.8 Data Analysis using visualizations
The Graphs below display the results of the training and testing from different configuration files.
Figure 3.8.1: Result of mAP on test-set using different configuration files.
The Graphs below display the results of the loss of the model during training from different configuration files.
Figure 3.8.2: Loss obtained from training on different configuration files
The Loss Graph and Prediction count for the Final model chosen is (yolov3-voc.cfg with mAP=50.27%)
The plot below displays the loss variation for yolov3-voc.cfg
Figure 3.8.3: Loss plot for optimal model
The bar graph above shows the Loss decreasing overtime and reaching a stable point at loss 0.18 at 12000 batches.
Figure 184.108.40.206: Individual defect count in the test set
Figure 220.127.116.11: Total predicted, True-positives, and False-Positives.
Figure 18.104.22.168: Individual accuracy of each defect.
Figure 22.214.171.124 gives us the number of predictions by the model for each class of defects, The True positives(i.e. the number of correct predictions) and the False positives(i.e. the number of incorrect predictions) with respect to each class. By looking at the above plot 126.96.36.199 we can see that the model performance is poor in detecting smaller featured objects like crazing, rolled-in-scale, but the performance of the model is decent while predicting objects which are bigger in size. Also by looking at above two figures we can say that even though the defects in the dataset are imbalanced, they are sufficient for the model for training i.e there is no chance of underfitting, because we can see that model has converged from loss plot and also we can see that highest defects in the train is inclusion and least is pitted-surface but we can see that accuracy of both these defects are decent whereas the accuracy of smaller defects is poor.
Initially, this project was started with a notion that when a video is playing this model should detect the defects in the video accordingly. It was difficult to gather the video regarding the manufacturing steel industry, a video is made by coagulating all the test- set images with 5 frames per second ( Figure 3.7.1) and another video is made by scanning the household utensils ( Steel Plates having scratches ) ( Figure 3.7.2) and then predictions are made on it where the model successfully predicted the defects.
Based on these experiments and researches conducted for real-time detection of anomalies using the computer vision model - YOLO, this model is successful to predict the defects in images but has poor performance while trying to predict the defects which are small in size. Hence, for detecting the defects which are minute in the Steel Manufacturing industry, YOLO Algorithms/Models are not mostly suitable.