= [
cheeses "Camembert",
"Roquefort",
"Comté",
"Époisses de Bourgogne",
"Tomme de Savoie",
"Bleu d’Auvergne",
"Brie de Meaux",
"Mimolette",
"Munster",
"Livarot",
"Pont-l’Évêque",
"Reblochon",
"Chabichou du Poitou",
"Valençay",
"Pélardon",
"Fourme d’Ambert",
"Selles-sur-Cher",
"Cantal",
"Neufchâtel",
"Banon",
"Gruyere"
]
Which cheese are we eating?
1 Let’s start with the why
I love cheese. Sometimes it is quite difficult to distinguish the varieties. Think about the embarrasement when you are in front of a mountain of cheese and can only point with your finger.
Therefore, I decided to built a ML classifier to help me.
The special difficulty here is that cheeses all look quite similar. Take, for example, the swiss Gruyere and the French Comte.
They are twins.
2 Let’s continue with with the data.
First, we need some data. Fast.ai provides an easy download module to download images from DuckDuckGo.
As an alternative, we could use a dataset, if we have one. Let’s start by downloading the files and then create a dataset.
2.1 Getting data from DuckDuckGo
Let’s start by defining what we want to download. We want cheese. In particular, French cheese.
To have a larger variety of images we define some extra search terms.
= [
search_terms "cheese close-up texture",
"cheese macro shot",
"cheese cut section"
]
As we work with Fast.ai , let’s import the basic stuff.
from duckduckgo_search import DDGS
from fastcore.all import *
from fastai.vision.all import *
def search_images(keywords, max_images=20): return L(DDGS().images(keywords, max_results=max_images)).itemgot('image')
import time, json
And then define our download function:
from fastdownload import download_url
from pathlib import Path
import time
=False
data_acquisition
def download():
# Loop through all combinations of cheeses and search terms
for cheese in cheeses:
= Path("which_cheese") / cheese # Create subdirectory for each cheese
dest =True, parents=True)
dest.mkdir(exist_ok
for term in search_terms:
= f"{cheese} {term}"
query =search_images(f"{query} photo"))
download_images(dest, urls5)
time.sleep(
# Resize images after downloading
=400, dest=dest)
resize_images(dest, max_size
# Run download only if data acquisition is enabled
if data_acquisition:
download()
We can verify the images now or later.
if data_acquisition:
= verify_images(get_image_files(path))
failed map(Path.unlink)
failed.len(failed)
failed
2.2 Loading data from a Kaggle dataset
I created a dataset of these images to avoid having to download again when I start over.
Sadly to uncertain copyright issues of this data, my dataset needs to remain private. But you can easily create your own.
As I run most of my code locally, I have some code to get it from Kaggle
= None
competition_name= 'cheese'
dataset_name
import os
from pathlib import Path
= os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')
iskaggle if competition_name:
if iskaggle:
= Path('../input/'+ competition_name)
comp_path else:
= Path(competition_name)
comp_path if not path.exists():
import zipfile,kaggle
str(comp_path))
kaggle.api.competition_download_cli(f'{comp_path}.zip').extractall(comp_path)
zipfile.ZipFile(
if dataset_name:
if iskaggle:
= Path(f'../input/{dataset_name}')
path else:
= Path(dataset_name)
path if not path.exists():
import zipfile, kaggle
='.')
kaggle.api.dataset_download_cli(dataset_name, pathf'{dataset_name}.zip').extractall(path) zipfile.ZipFile(
Now we have downloaded the data, we can start using it.
3 Cleaning the data with the help of our first model
Before we dive into different options for modelling, we will do a quick pass through the data and see which images are bad.
The background is that the scrapper picks up many images, which are not good for training.
We start by creating a working copy of the dataset.
!mkdir -p working/which_cheese_first
!cp -r cheese/which_cheese working/which_cheese_first
To be sure that all images are valid, we check again for corrupeted files and remove them.
from pathlib import Path
from PIL import Image
= Path("working/which_cheese_first")
data_path
# Check all images
= []
corrupt_files for img_path in data_path.rglob("*.*"): # Match all files inside subfolders
try:
with Image.open(img_path) as img:
# Verify if it's a valid image
img.verify() except (IOError, SyntaxError):
corrupt_files.append(img_path)
# Remove corrupt images
print(f"Found {len(corrupt_files)} corrupt images.")
for corrupt in corrupt_files:
print(f"Deleting {corrupt}")
# Delete the file corrupt.unlink()
Found 48 corrupt images.
Deleting working/which_cheese_first/which_cheese/Roquefort/350d3e67-dcf6-4292-b963-c1d5841b8788.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/594d40b1-f655-4db1-b3a9-4e7d6bb6c631.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/32a9069e-52c2-47e1-9db4-16197556c4fb.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/c73fb213-3813-43fd-b5ae-2d390ca8e3d5.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/2c426320-24bd-4869-8f1c-d09171ac6294.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/83a95414-4083-48d7-9956-be5d82b05caf.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/f4f09c62-652b-400c-8e09-419389635fc4.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/dfa07f3c-0931-49aa-b3c2-9c4a5901565d.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/609abf59-c1f0-4a34-b2cf-1bedf1b4cea0.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/b56ab8cc-5b37-40c9-be31-57d14c843978.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/422aec71-31d9-421e-880c-91867eaa5dfb.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/5591880b-37f4-4bcc-9927-8f60b6d6bb37.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/a9e2a7ad-038e-4b6d-8dee-19fd1661ebe1.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/4a572868-b982-47ed-b96e-3eb1a755e32a.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/8903d049-4256-4fe5-9716-48e5fc8ef52b.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/849c9bb0-b717-40a7-922e-091e22e36579.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/4582e06a-0218-4b6b-aeaf-7e7d61dd3827.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/bf950a7d-6ab2-4dd3-81dc-5ec14b9964dc.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/052b599d-f560-473c-947c-74bb3c138167.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/ccf3f8e7-aa87-426d-bea2-a1f18a89be05.jpg
Deleting working/which_cheese_first/which_cheese/Manchego/f7de39d9-0ff2-4a99-aa92-807b27fa7d90.jpg
Deleting working/which_cheese_first/which_cheese/Manchego/0352de9a-3f83-4ce7-bfbe-207da04840a3.jpg
Deleting working/which_cheese_first/which_cheese/Manchego/0592b012-96e5-4f22-ac2e-acc8ab41ecc4.jpg
Deleting working/which_cheese_first/which_cheese/Fourme d’Ambert/0e36dc86-5e2a-4635-afcc-e3e0ec972aee.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/d827858f-aac0-49f4-b397-facadcfb70fb.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/bed8cf04-9305-4f00-9a8a-1b869e00701c.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/8963c142-9a63-43dc-8268-f54a1b6fbb2b.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/bbf224a3-5033-49c0-8b0c-92068e50382f.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/19d6e4f5-0393-455c-b2d4-7ecccfd93431.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/6ad1c9d8-1f29-4da8-ae7d-78915460cf35.jpg
Deleting working/which_cheese_first/which_cheese/Selles-sur-Cher/93d14546-21bf-46e5-89de-e336b474baf3.jpg
Deleting working/which_cheese_first/which_cheese/Mimolette/17abeba3-b113-4c84-90ed-b17b6152c71d.jpg
Deleting working/which_cheese_first/which_cheese/Mimolette/72f3db51-da86-4934-bad7-c1b5e54cfb46.jpg
Deleting working/which_cheese_first/which_cheese/Mimolette/16f74a99-f1ef-46fd-a809-8f332ad235b7.jpg
Deleting working/which_cheese_first/which_cheese/Époisses de Bourgogne/9484a03b-af27-4155-a950-bc07187f00f0.jpg
Deleting working/which_cheese_first/which_cheese/Livarot/ffe3e263-a49b-41bc-bdba-4b66cdc12475.jpg
Deleting working/which_cheese_first/which_cheese/Livarot/57e84bd7-8936-4d55-8ec4-cfcc1073b9a4.jpg
Deleting working/which_cheese_first/which_cheese/Gruyere/9316e837-a0b2-468a-a287-69ee27b840ba.jpg
Deleting working/which_cheese_first/which_cheese/Gruyere/dcc3320a-408c-4b93-a1b6-bf2f3f25aa15.jpg
Deleting working/which_cheese_first/which_cheese/Gruyere/3d5755ba-b8b3-4636-8213-54a3bf19613d.jpg
Deleting working/which_cheese_first/which_cheese/Comté/5b92cce2-46d4-46f8-9f0f-7952076ded0a.jpg
Deleting working/which_cheese_first/which_cheese/Reblochon/3ff8d8f8-09c4-4f83-85b8-9c089fcd6805.jpg
Deleting working/which_cheese_first/which_cheese/Pélardon/a0e86302-8ca3-47ab-ab4c-bdf4834ca208.jpg
Deleting working/which_cheese_first/which_cheese/Pélardon/6370f787-f13e-4acf-aefb-ad67f68d32c2.jpg
Deleting working/which_cheese_first/which_cheese/Pont-l’Évêque/4efc1ad3-575d-4a32-9063-e403fd57d7c9.jpg
Deleting working/which_cheese_first/which_cheese/Tomme de Savoie/5e23fd21-574a-47a0-bc9b-ca52984ae9a5.jpg
Deleting working/which_cheese_first/which_cheese/Tomme de Savoie/5cf2571c-74f7-4a5b-9374-ef4c480267df.jpg
Deleting working/which_cheese_first/which_cheese/Valençay/a45d3613-5dbc-45b5-85c0-2ead70ccf221.jpg
3.1 Model definition
We will define a simple model and check if the data is loaded correctly. The most simple model for image classification is resnet18
.
from fastcore.all import *
from fastai.vision.all import *
= DataBlock(
cheese =(ImageBlock, CategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(valid_pct=0.2, seed=42),
splitter=parent_label,
get_y=[Resize(192, method='squish')]
item_tfms )
= cheese.dataloaders("working/which_cheese_first") dls
dls.show_batch()
For the metrics, I chose accuracy
as this is the most easy to analyze. We later see that the dataset becomes slightly imbalanced in training and F1-score
would be better.
= vision_learner(dls, resnet18, metrics=accuracy) learn
We then do a quick learning pass.
3) learn.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.307302 | 2.287525 | 0.356164 | 00:02 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 2.265255 | 1.649305 | 0.547945 | 00:03 |
1 | 1.552460 | 1.265489 | 0.662100 | 00:03 |
2 | 1.129812 | 1.213783 | 0.666667 | 00:03 |
As we can see, accuracy increased to 66% after 3 epochs.
4 Data Cleaning
We can have a look at the confusion matrix. There are some cheeses that are easily confused with each other. For example Bleu d’Auvergne
with Fourme d’Ambert
. In fact, in cheese stores outside France, few people seem to know the second one. But also the hard cheeses, Cantal
, Comte
, and Gruyere
. The last two are two standard mountain cheeses, one from France and the other from Switzerland. The only differ by their texture. Comte of the same age are a little creamier and have fewer crevices. I especially added the Gruyere
to make the dataset harder.
= ClassificationInterpretation.from_learner(learn)
interp interp.plot_confusion_matrix()
Let’s have a look at the top losses.
10) interp.plot_top_losses(
4.1 All the same?
As expected, similar cheese from the same group is difficult to distinguish.
Let’s do some data cleaning.
For the Comte, Gruyere, Munster: pictures with the highest loss are those with little detail or other accessories like bread or knifes.
from fastai.vision.widgets import *
=[] files_to_clean
= ImageClassifierCleaner(learn)
cleaner cleaner
4.2 IMPORTANT: How to use the cleaner
For each category and train & valid sets, select the images and then run the following cell. It seems the cleaner doesn’t remember the selections in other categories.
We can also not run the above cell multiple times after we cleaned some files, as those will be missing. Instead, we go through all categories and collect files to be deleted.
We do not change categories for now.
for idx in cleaner.delete():
files_to_clean.append(cleaner.fns[idx])
for file in files_to_clean:
try:
file.unlink()
except:
pass
After a lot of examination I cleaned my dataset from 1100 files to 1029. I have run the following cells to create a copy of the cleaned data. For protection of the data, this cell is commented.
#!mkdir -p working/which_cheese_cleaned
#!cp -r working/which_cheese_first working/which_cheese_cleaned
5 Fast iterations to improve to analyze the data
5.1 Working with cleaned data
Now we have cleaned some data, we can train again, using more advanced techniques.
We will start by a simple training again, to see if the cleaning was successful.
from fastcore.all import *
from fastai.vision.all import *
= DataBlock(
cheese =(ImageBlock, CategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(valid_pct=0.2, seed=42),
splitter=parent_label,
get_y=[Resize(192, method='squish')]
item_tfms
)= cheese.dataloaders("working/which_cheese_cleaned")
dls = vision_learner(dls, resnet18, metrics=accuracy) learn
3) learn.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.137223 | 2.343409 | 0.326829 | 00:02 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 2.324481 | 1.571146 | 0.570732 | 00:02 |
1 | 1.581721 | 1.212596 | 0.643902 | 00:02 |
2 | 1.146727 | 1.148679 | 0.678049 | 00:02 |
We now roughly archieve 68% accuracy. Let’s train further to see how far when can get.
13) learn.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.523904 | 1.073870 | 0.717073 | 00:02 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.314336 | 1.042765 | 0.721951 | 00:02 |
1 | 0.260081 | 0.991794 | 0.741463 | 00:02 |
2 | 0.203073 | 0.943358 | 0.741463 | 00:02 |
3 | 0.158532 | 0.913470 | 0.756098 | 00:02 |
4 | 0.141772 | 0.872876 | 0.751220 | 00:02 |
5 | 0.121437 | 0.816914 | 0.751220 | 00:02 |
6 | 0.101683 | 0.836497 | 0.765854 | 00:02 |
7 | 0.085780 | 0.845604 | 0.751220 | 00:02 |
8 | 0.071734 | 0.842247 | 0.760976 | 00:02 |
9 | 0.062432 | 0.823996 | 0.765854 | 00:02 |
10 | 0.053119 | 0.811724 | 0.760976 | 00:02 |
11 | 0.044131 | 0.817966 | 0.760976 | 00:02 |
12 | 0.038983 | 0.820316 | 0.760976 | 00:02 |
We seem to have hit a wall at 76% accuracy as early as iteration 6.
5.1.1 A word on the choice of metrics
Earlier I chose accuracy
as the metric. Let’s examine our data to see if the choice is still valid.
1]] for o in dls.train_ds]).value_counts() pd.Series([dls.vocab[o[
Fourme d’Ambert 48
Chabichou du Poitou 47
Mimolette 44
Pont-l’Évêque 44
Brie de Meaux 43
Comté 41
Tomme de Savoie 41
Cantal 40
Pélardon 40
Reblochon 39
Valençay 39
Bleu d’Auvergne 38
Neufchâtel 37
Livarot 36
Selles-sur-Cher 35
Camembert 34
Époisses de Bourgogne 33
Manchego 32
Munster 32
Gruyere 29
Roquefort 27
Banon 24
Name: count, dtype: int64
As I mentioned earlier, the dataset is no longer balanced. However, it is also not imbalanced, as the imbalance is 2:1 and not 1:10, an order of magnitude. We stick with accuracy
.
5.2 Data Augmentation
We do not have many images in the data. Therefore, we will use data augmentation and move from squishing to RandomResizedCrop
.
= DataBlock(
cheese_augmented =(ImageBlock, CategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(valid_pct=0.2, seed=42),
splitter=parent_label,
get_y=RandomResizedCrop(192, min_scale=0.3),
item_tfms=aug_transforms(mult=2))
batch_tfms
= cheese_augmented.dataloaders("working/which_cheese_cleaned") dls
Note: I choose here to override the variables. A standard programming approach would use new variables. However, the learner reserves memory on the GPU. We will hit an
out of memory
error. One option is to delete the previous variable and free up the memory. The other option, which I chose here, is to override it with a new learner. This override implicitly deleted the old learner.
= vision_learner(dls, resnet18, metrics=accuracy) learn
We will pull another trick and use a better learning rate.
learn.lr_find()
SuggestedLRs(valley=0.0014454397605732083)
16, 1.44e-3) learn.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.298504 | 2.772096 | 0.278049 | 00:02 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 3.559263 | 2.476911 | 0.331707 | 00:02 |
1 | 3.395612 | 2.182159 | 0.370732 | 00:02 |
2 | 3.096542 | 1.812795 | 0.492683 | 00:02 |
3 | 2.789779 | 1.489513 | 0.570732 | 00:02 |
4 | 2.507986 | 1.255989 | 0.629268 | 00:02 |
5 | 2.255753 | 1.103503 | 0.687805 | 00:02 |
6 | 1.996111 | 1.050033 | 0.726829 | 00:02 |
7 | 1.788129 | 0.995375 | 0.741463 | 00:02 |
8 | 1.612162 | 0.972283 | 0.741463 | 00:02 |
9 | 1.448160 | 0.921064 | 0.736585 | 00:02 |
10 | 1.303951 | 0.902030 | 0.751220 | 00:02 |
11 | 1.197405 | 0.879721 | 0.765854 | 00:02 |
12 | 1.133353 | 0.869218 | 0.760976 | 00:02 |
13 | 1.073917 | 0.860385 | 0.775610 | 00:02 |
14 | 1.003178 | 0.850505 | 0.770732 | 00:02 |
15 | 0.972047 | 0.851660 | 0.775610 | 00:02 |
6, 1.44e-3) learn.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.824386 | 0.848143 | 0.780488 | 00:02 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.778520 | 0.854400 | 0.780488 | 00:02 |
1 | 0.784403 | 0.840095 | 0.770732 | 00:02 |
2 | 0.752580 | 0.833080 | 0.760976 | 00:02 |
3 | 0.711079 | 0.824473 | 0.775610 | 00:02 |
4 | 0.661186 | 0.804503 | 0.760976 | 00:02 |
5 | 0.619942 | 0.800512 | 0.765854 | 00:02 |
6, 1.44e-3) learn.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.536938 | 0.783610 | 0.775610 | 00:02 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.553907 | 0.786242 | 0.775610 | 00:02 |
1 | 0.547560 | 0.848543 | 0.765854 | 00:02 |
2 | 0.524134 | 0.848381 | 0.756098 | 00:02 |
3 | 0.499666 | 0.811153 | 0.780488 | 00:02 |
4 | 0.473074 | 0.783625 | 0.780488 | 00:02 |
5 | 0.465626 | 0.781733 | 0.785366 | 00:02 |
The training advanced more slowly. It seems to have hit the same block at 76%, 77%. Only after 12 more iterations, we seem to have converged on a path with over 78%. The Validation loss is only going down after 6 iterations, showing convergence issues of the gradient descent.
Let’s look at the solution
= ClassificationInterpretation.from_learner(learn)
interp interp.plot_confusion_matrix()
5, nrows=1) interp.plot_top_losses(
The fourme d'ambert
uncertainty has almost vanished. The top losses are from images that have slipped my cleaning efforts and are indeeed misleading.
5.3 Label smoothing
As the data still has a lot of noise, we can try label smoothing. Labelsmoothing
assumes a natural uncertainty and no label can have 100%. Instead, Label smoothing redistributes a small portion of the correct class’s probability across all classes to prevent overconfidence and improve generalization.
We will start with 28 iterations.
from fastai.losses import LabelSmoothingCrossEntropy
= vision_learner(dls, resnet18, metrics=accuracy, loss_func=LabelSmoothingCrossEntropy()) learn
learn.lr_find()
SuggestedLRs(valley=0.00363078061491251)
28, 3.6e-3) learn.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.273853 | 2.742732 | 0.307317 | 00:02 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 2.933313 | 2.152079 | 0.429268 | 00:02 |
1 | 2.757924 | 1.957854 | 0.531707 | 00:02 |
2 | 2.600373 | 1.795183 | 0.595122 | 00:02 |
3 | 2.414058 | 1.652869 | 0.648780 | 00:02 |
4 | 2.251830 | 1.555603 | 0.692683 | 00:02 |
5 | 2.109990 | 1.569267 | 0.702439 | 00:02 |
6 | 1.985083 | 1.525316 | 0.717073 | 00:02 |
7 | 1.878232 | 1.548784 | 0.717073 | 00:02 |
8 | 1.782599 | 1.508083 | 0.726829 | 00:02 |
9 | 1.692580 | 1.468358 | 0.746341 | 00:02 |
10 | 1.610464 | 1.430262 | 0.741463 | 00:02 |
11 | 1.544696 | 1.419962 | 0.721951 | 00:02 |
12 | 1.477807 | 1.413017 | 0.751220 | 00:02 |
13 | 1.419630 | 1.305687 | 0.760976 | 00:02 |
14 | 1.370959 | 1.298595 | 0.795122 | 00:02 |
15 | 1.320189 | 1.298479 | 0.814634 | 00:02 |
16 | 1.275702 | 1.271670 | 0.785366 | 00:02 |
17 | 1.247922 | 1.282414 | 0.770732 | 00:02 |
18 | 1.205016 | 1.259176 | 0.775610 | 00:02 |
19 | 1.168169 | 1.248492 | 0.775610 | 00:02 |
20 | 1.135880 | 1.244297 | 0.780488 | 00:02 |
21 | 1.108426 | 1.244742 | 0.785366 | 00:02 |
22 | 1.088607 | 1.238094 | 0.775610 | 00:02 |
23 | 1.073189 | 1.241032 | 0.780488 | 00:02 |
24 | 1.053456 | 1.239150 | 0.775610 | 00:02 |
25 | 1.044365 | 1.244962 | 0.780488 | 00:02 |
26 | 1.034709 | 1.242740 | 0.780488 | 00:02 |
27 | 1.029999 | 1.233360 | 0.790244 | 00:02 |
Starting at iteration 13, issues emerged with the loss function. We observed an accuracy of approximately 81%. Yet, our accuracy is just 79%, despite the reduced loss
5.4 Summary
Data augmentation
and Label Smoothing
both help with very noisy data and a low amount of samples. We got the accuracy from 76% to 81%.
6 Bigger is better
So they say in mechanical engineering.
Let’s try improvements for size.
6.1 Bigger images
First, we increase the images.
from fastcore.all import *
from fastai.vision.all import *
= DataBlock(
cheese =(ImageBlock, CategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(valid_pct=0.2, seed=42),
splitter=parent_label,
get_y=RandomResizedCrop(256, min_scale=0.3),
item_tfms=aug_transforms(mult=2))
batch_tfms
= cheese.dataloaders("working/which_cheese_cleaned") dls
= vision_learner(dls, resnet18, metrics=accuracy) learn_better
learn_better.lr_find()
SuggestedLRs(valley=0.0008317637839354575)
6.1.1 Note: Beware CUDA out of memory
As we increase the size of the data and the model we can run of memory. After the crash, the memory stays allocated.
The standard approach is to run torch.cuda.empty_cache()
and run garbage collection..
Sometimes, the memory still keeps being allocated and i need multiple passes to free up the memory. I wrote a utility function to do just that.
As I use an old GPU with only 8GB, I frequently run in the out-of-memory
error.
def free_cuda_memory(var_name, globals_dict, max_attempts=5, delay=0.5):
"""
Deletes a variable by name, collects garbage, and repeatedly clears CUDA memory until freed.
Args:
var_name (str): Name of the variable to delete.
globals_dict (dict): Pass `globals()` to delete from the global scope.
max_attempts (int): Maximum attempts to clear memory.
delay (float): Time (in seconds) to wait between attempts.
"""
import torch
import gc
import time
if var_name in globals_dict:
del globals_dict[var_name]
else:
print(f"Variable '{var_name}' not found in globals.")
return
for _ in range(max_attempts):
gc.collect()
torch.cuda.empty_cache()
torch.cuda.ipc_collect()
time.sleep(delay)
# Check if memory is freed
= torch.cuda.memory_allocated()
allocated = torch.cuda.memory_reserved()
cached
if allocated == 0 and cached == 0:
print("CUDA memory successfully freed.")
return
print("Warning: Some CUDA memory may still be blocked.")
print(f"Allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
print(f"Cached: {torch.cuda.memory_reserved() / 1e9:.2f} GB")
"learn_better",globals()) free_cuda_memory(
Variable 'learn_better' not found in globals.
20, 8.3e-4) learn_better.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.542337 | 3.085513 | 0.146341 | 00:02 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.000002 | 2.851287 | 0.180488 | 00:03 |
1 | 3.854214 | 2.640312 | 0.214634 | 00:03 |
2 | 3.678350 | 2.329466 | 0.321951 | 00:03 |
3 | 3.460999 | 1.959772 | 0.409756 | 00:03 |
4 | 3.187331 | 1.625292 | 0.536585 | 00:03 |
5 | 2.932109 | 1.408548 | 0.604878 | 00:03 |
6 | 2.674737 | 1.244989 | 0.668293 | 00:03 |
7 | 2.424846 | 1.146155 | 0.663415 | 00:03 |
8 | 2.201131 | 1.025524 | 0.707317 | 00:03 |
9 | 2.034413 | 0.931238 | 0.726829 | 00:03 |
10 | 1.865840 | 0.851306 | 0.756098 | 00:03 |
11 | 1.716559 | 0.824157 | 0.741463 | 00:03 |
12 | 1.578321 | 0.804028 | 0.770732 | 00:03 |
13 | 1.461851 | 0.793212 | 0.775610 | 00:03 |
14 | 1.359122 | 0.781659 | 0.795122 | 00:03 |
15 | 1.279223 | 0.774161 | 0.795122 | 00:03 |
16 | 1.229434 | 0.775929 | 0.795122 | 00:03 |
17 | 1.166556 | 0.768999 | 0.795122 | 00:03 |
18 | 1.123601 | 0.767946 | 0.800000 | 00:03 |
19 | 1.104936 | 0.771056 | 0.790244 | 00:03 |
'resnet.pkl') learn_better.export(
Almost 80%. After 20 epochs, the goal seems to have been reached. In another run I had 82%. Despite the lack of consistency, I count this as a record.
6.2 Bigger Model
Instead of the images we can increase the model, we will go for resnet34
and resnet50
.
= DataBlock(
cheese =(ImageBlock, CategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(valid_pct=0.2, seed=42),
splitter=parent_label,
get_y=RandomResizedCrop(192, min_scale=0.3),
item_tfms=aug_transforms(mult=2))
batch_tfms
= cheese.dataloaders("working/which_cheese_cleaned") dls
= vision_learner(dls, resnet34, metrics=accuracy) learn_better
learn_better.lr_find()
SuggestedLRs(valley=0.0020892962347716093)
20, 2e-3) learn_better.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.319314 | 2.530106 | 0.243902 | 00:02 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 3.351670 | 2.044574 | 0.380488 | 00:03 |
1 | 3.058213 | 1.685639 | 0.507317 | 00:03 |
2 | 2.746131 | 1.313843 | 0.639024 | 00:03 |
3 | 2.436243 | 1.013334 | 0.731707 | 00:03 |
4 | 2.142230 | 0.840266 | 0.775610 | 00:03 |
5 | 1.898090 | 0.805258 | 0.770732 | 00:03 |
6 | 1.671135 | 0.764192 | 0.800000 | 00:03 |
7 | 1.478277 | 0.738444 | 0.809756 | 00:03 |
8 | 1.305836 | 0.683891 | 0.785366 | 00:03 |
9 | 1.142530 | 0.632159 | 0.790244 | 00:03 |
10 | 1.006283 | 0.622701 | 0.814634 | 00:03 |
11 | 0.882134 | 0.641913 | 0.790244 | 00:03 |
12 | 0.775799 | 0.630769 | 0.780488 | 00:03 |
13 | 0.712351 | 0.629713 | 0.790244 | 00:03 |
14 | 0.638866 | 0.643542 | 0.790244 | 00:03 |
15 | 0.576997 | 0.637436 | 0.790244 | 00:03 |
16 | 0.545079 | 0.637268 | 0.804878 | 00:03 |
17 | 0.505899 | 0.642498 | 0.804878 | 00:03 |
18 | 0.472322 | 0.642579 | 0.800000 | 00:03 |
19 | 0.449359 | 0.632896 | 0.804878 | 00:03 |
= vision_learner(dls, resnet50, metrics=accuracy) learn_better
learn_better.lr_find()
SuggestedLRs(valley=0.0010000000474974513)
20, 1e-3) learn_better.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.393582 | 2.744330 | 0.204878 | 00:04 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 3.321695 | 2.486286 | 0.287805 | 00:06 |
1 | 3.125788 | 2.221527 | 0.356098 | 00:06 |
2 | 2.849939 | 1.930392 | 0.439024 | 00:06 |
3 | 2.639139 | 1.629388 | 0.502439 | 00:06 |
4 | 2.387037 | 1.396374 | 0.600000 | 00:06 |
5 | 2.145477 | 1.242509 | 0.648780 | 00:06 |
6 | 1.933496 | 1.132375 | 0.687805 | 00:06 |
7 | 1.738567 | 1.025986 | 0.717073 | 00:06 |
8 | 1.557685 | 0.977237 | 0.756098 | 00:06 |
9 | 1.412416 | 0.930118 | 0.765854 | 00:06 |
10 | 1.275174 | 0.908866 | 0.756098 | 00:06 |
11 | 1.159127 | 0.913954 | 0.765854 | 00:06 |
12 | 1.066809 | 0.898841 | 0.775610 | 00:06 |
13 | 0.990926 | 0.876705 | 0.780488 | 00:06 |
14 | 0.921643 | 0.868463 | 0.785366 | 00:06 |
15 | 0.874949 | 0.841853 | 0.780488 | 00:06 |
16 | 0.826003 | 0.845446 | 0.765854 | 00:06 |
17 | 0.782675 | 0.861964 | 0.765854 | 00:06 |
18 | 0.737917 | 0.838224 | 0.770732 | 00:06 |
19 | 0.717874 | 0.844599 | 0.765854 | 00:06 |
Remarkably the bigger model resnet34
also can achieve 81% and the training is better converging. Conversely, the even larger ResNet50
model yields inferior results. This could be due to the limited amount of data.
Let’s see how the big model handles big images.
= DataBlock(
cheese =(ImageBlock, CategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(valid_pct=0.2, seed=42),
splitter=parent_label,
get_y=RandomResizedCrop(256, min_scale=0.3),
item_tfms=aug_transforms(mult=2))
batch_tfms
= cheese.dataloaders("working/which_cheese_cleaned")
dls = vision_learner(dls, resnet34, metrics=accuracy) learn_better
learn_better.lr_find()
SuggestedLRs(valley=0.0006918309954926372)
20, 6.9e-4) learn_better.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.543718 | 3.211622 | 0.092683 | 00:04 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.034224 | 2.881157 | 0.180488 | 00:05 |
1 | 3.940435 | 2.636389 | 0.229268 | 00:05 |
2 | 3.715499 | 2.337412 | 0.326829 | 00:05 |
3 | 3.486756 | 1.973952 | 0.429268 | 00:05 |
4 | 3.248124 | 1.657461 | 0.507317 | 00:05 |
5 | 2.996700 | 1.410418 | 0.595122 | 00:05 |
6 | 2.718238 | 1.225778 | 0.673171 | 00:05 |
7 | 2.460284 | 1.119877 | 0.697561 | 00:05 |
8 | 2.235554 | 1.047230 | 0.692683 | 00:05 |
9 | 2.015765 | 0.982388 | 0.717073 | 00:05 |
10 | 1.827305 | 0.939780 | 0.726829 | 00:05 |
11 | 1.672784 | 0.906191 | 0.741463 | 00:05 |
12 | 1.559439 | 0.882196 | 0.756098 | 00:05 |
13 | 1.426843 | 0.865841 | 0.765854 | 00:05 |
14 | 1.338417 | 0.850578 | 0.765854 | 00:05 |
15 | 1.243912 | 0.847275 | 0.746341 | 00:05 |
16 | 1.178589 | 0.844610 | 0.751220 | 00:05 |
17 | 1.117920 | 0.846011 | 0.746341 | 00:05 |
18 | 1.063974 | 0.839167 | 0.751220 | 00:05 |
19 | 1.025944 | 0.836000 | 0.765854 | 00:05 |
First, it needs to be noted that the learning rate is lower. But also the results are worse than for resnet18
.
6.3 Even bigger images
We will increase the images even further. We resized our images to 400px, so there is no point in going larger than 312px.
= DataBlock(
cheese =(ImageBlock, CategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(valid_pct=0.2, seed=42),
splitter=parent_label,
get_y=RandomResizedCrop(312, min_scale=0.3),
item_tfms=aug_transforms(mult=2))
batch_tfms
= cheese.dataloaders("working/which_cheese_cleaned")
dls = vision_learner(dls, resnet34, metrics=accuracy) learn_better
learn_better.lr_find()
SuggestedLRs(valley=0.0010000000474974513)
20, 1e-3) learn_better.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.543712 | 3.039113 | 0.146341 | 00:06 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 3.761286 | 2.706118 | 0.234146 | 00:07 |
1 | 3.621768 | 2.441219 | 0.297561 | 00:07 |
2 | 3.425048 | 2.090192 | 0.409756 | 00:08 |
3 | 3.118869 | 1.713392 | 0.507317 | 00:07 |
4 | 2.836166 | 1.382881 | 0.585366 | 00:07 |
5 | 2.560673 | 1.180610 | 0.629268 | 00:07 |
6 | 2.283780 | 1.015514 | 0.697561 | 00:07 |
7 | 2.033190 | 0.904861 | 0.736585 | 00:07 |
8 | 1.819672 | 0.865400 | 0.731707 | 00:07 |
9 | 1.640550 | 0.837150 | 0.736585 | 00:07 |
10 | 1.489337 | 0.794789 | 0.765854 | 00:07 |
11 | 1.336533 | 0.753989 | 0.770732 | 00:07 |
12 | 1.211884 | 0.724643 | 0.775610 | 00:07 |
13 | 1.124595 | 0.710352 | 0.790244 | 00:07 |
14 | 1.030781 | 0.710033 | 0.785366 | 00:07 |
15 | 0.960203 | 0.696822 | 0.790244 | 00:07 |
16 | 0.899652 | 0.694591 | 0.795122 | 00:07 |
17 | 0.841474 | 0.690466 | 0.809756 | 00:07 |
18 | 0.791879 | 0.689909 | 0.795122 | 00:07 |
19 | 0.763160 | 0.689728 | 0.800000 | 00:07 |
= DataBlock(
cheese =(ImageBlock, CategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(valid_pct=0.2, seed=42),
splitter=parent_label,
get_y=RandomResizedCrop(312, min_scale=0.3),
item_tfms=aug_transforms(mult=2))
batch_tfms
= cheese.dataloaders("working/which_cheese_cleaned")
dls = vision_learner(dls, resnet18, metrics=accuracy) learn_better
learn_better.lr_find()
SuggestedLRs(valley=0.0010000000474974513)
20, 1e-3) learn_better.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.559435 | 2.919010 | 0.170732 | 00:04 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 3.794993 | 2.674616 | 0.239024 | 00:05 |
1 | 3.646426 | 2.417594 | 0.307317 | 00:05 |
2 | 3.475905 | 2.064787 | 0.375610 | 00:05 |
3 | 3.235945 | 1.699186 | 0.502439 | 00:05 |
4 | 2.987770 | 1.406921 | 0.560976 | 00:05 |
5 | 2.700939 | 1.190468 | 0.643902 | 00:05 |
6 | 2.444276 | 1.058556 | 0.712195 | 00:05 |
7 | 2.220634 | 0.985898 | 0.726829 | 00:05 |
8 | 2.032653 | 0.906098 | 0.760976 | 00:05 |
9 | 1.847860 | 0.848331 | 0.765854 | 00:05 |
10 | 1.679043 | 0.809622 | 0.746341 | 00:05 |
11 | 1.530006 | 0.772010 | 0.736585 | 00:05 |
12 | 1.410453 | 0.746380 | 0.765854 | 00:05 |
13 | 1.309522 | 0.737125 | 0.780488 | 00:05 |
14 | 1.227750 | 0.726092 | 0.790244 | 00:05 |
15 | 1.161594 | 0.708379 | 0.785366 | 00:05 |
16 | 1.102717 | 0.702760 | 0.775610 | 00:05 |
17 | 1.054643 | 0.707285 | 0.785366 | 00:05 |
18 | 1.007804 | 0.689951 | 0.790244 | 00:05 |
19 | 0.993321 | 0.692554 | 0.785366 | 00:05 |
Interestingly, the even bigger size brings the bigger model to a slight advantage, but not much.
It could be that the bigger model has a better capacity to learn from more data. Whereas the smaller model generalizes better on a smaller dataset.
Research has also shown this: https://en.wikipedia.org/wiki/Neural_scaling_law.
Larger models often perform better with more data because of their capacity to learn complex patterns, while smaller models may generalize better on smaller datasets, reducing overfitting.
6.4 Label smoothing
We will try the size together with the label smoothing.
= DataBlock(
cheese =(ImageBlock, CategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(valid_pct=0.2, seed=42),
splitter=parent_label,
get_y=RandomResizedCrop(312, min_scale=0.3),
item_tfms=aug_transforms(mult=2))
batch_tfms= cheese.dataloaders("working/which_cheese_cleaned")
dls = vision_learner(dls, resnet18, metrics=accuracy, loss_func=LabelSmoothingCrossEntropy()) learn_better
learn_better.lr_find()
SuggestedLRs(valley=0.0014454397605732083)
20, 1.4e-3) learn_better.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.394593 | 2.871440 | 0.229268 | 00:04 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 3.653049 | 2.588406 | 0.317073 | 00:05 |
1 | 3.530752 | 2.326544 | 0.365854 | 00:05 |
2 | 3.368824 | 2.011567 | 0.502439 | 00:05 |
3 | 3.158019 | 1.743854 | 0.590244 | 00:05 |
4 | 2.937979 | 1.583111 | 0.668293 | 00:05 |
5 | 2.703008 | 1.478423 | 0.707317 | 00:05 |
6 | 2.499692 | 1.461678 | 0.785366 | 00:05 |
7 | 2.328650 | 1.402460 | 0.770732 | 00:05 |
8 | 2.176972 | 1.380992 | 0.760976 | 00:05 |
9 | 2.045058 | 1.355799 | 0.751220 | 00:05 |
10 | 1.940953 | 1.329172 | 0.780488 | 00:05 |
11 | 1.850078 | 1.313228 | 0.809756 | 00:05 |
12 | 1.770922 | 1.311223 | 0.809756 | 00:05 |
13 | 1.706659 | 1.292058 | 0.819512 | 00:05 |
14 | 1.641106 | 1.296595 | 0.800000 | 00:05 |
15 | 1.590719 | 1.295866 | 0.814634 | 00:05 |
16 | 1.561213 | 1.287696 | 0.800000 | 00:05 |
17 | 1.523102 | 1.286888 | 0.809756 | 00:05 |
18 | 1.498262 | 1.279946 | 0.809756 | 00:05 |
19 | 1.474353 | 1.274973 | 0.814634 | 00:05 |
Sadly the LabelSmoothing
only improved the convergence. The final score is not better than the initial try with bigger images.
7 Modern model architectures
Resnet
is quite dated. A newer model, ConvNext
is reported to deliver better results.
We will start with the base variant of the model. Due to its size, we need to limit the batch size to 16. Currently, I found no other way than trial and error to determine the batch size.
During the discovery of the correct batch size, I multiple times hit the memory ceiling. My free_cuda_memory
function came in handy.
from fastcore.all import *
from fastai.vision.all import *
= DataBlock(
cheese =(ImageBlock, CategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(valid_pct=0.2, seed=42),
splitter=parent_label,
get_y=RandomResizedCrop(256, min_scale=0.3),
item_tfms=aug_transforms(mult=2))
batch_tfms
= cheese.dataloaders("working/which_cheese_cleaned", bs=16)
dls = vision_learner(dls, convnext_base, metrics=accuracy) learn
learn.lr_find()
SuggestedLRs(valley=0.0010000000474974513)
20, 1.e-3) learn.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.031354 | 2.283592 | 0.321951 | 00:37 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 2.887947 | 2.005787 | 0.385366 | 01:28 |
1 | 2.703871 | 1.712844 | 0.492683 | 01:28 |
2 | 2.446322 | 1.513336 | 0.585366 | 01:27 |
3 | 2.149493 | 1.298572 | 0.614634 | 01:30 |
4 | 1.878358 | 0.989779 | 0.712195 | 01:29 |
5 | 1.694279 | 0.902841 | 0.731707 | 01:29 |
6 | 1.502565 | 0.791859 | 0.775610 | 01:30 |
7 | 1.361015 | 0.699819 | 0.795122 | 01:30 |
8 | 1.272284 | 0.713360 | 0.809756 | 01:30 |
9 | 1.163435 | 0.629532 | 0.804878 | 01:29 |
10 | 1.010350 | 0.623500 | 0.829268 | 01:30 |
11 | 0.907372 | 0.668074 | 0.785366 | 01:30 |
12 | 0.915707 | 0.625355 | 0.804878 | 01:29 |
13 | 0.833985 | 0.539842 | 0.829268 | 01:30 |
14 | 0.772828 | 0.531026 | 0.824390 | 01:30 |
15 | 0.735248 | 0.518410 | 0.824390 | 01:30 |
16 | 0.715735 | 0.510863 | 0.819512 | 01:30 |
17 | 0.726851 | 0.512946 | 0.829268 | 01:30 |
18 | 0.732415 | 0.508770 | 0.824390 | 01:30 |
19 | 0.737635 | 0.505515 | 0.829268 | 01:29 |
The convnext-base
model reached 83% already after 10 iterations. Afterwards, the loss improved, but accuracy did not. However, the model is big with 350mb. We will save it for later.
'convnext_base.pkl') learn.export(
7.1 Trying something smaller
There is also a convnext-tiny
model, which should produce a smaller model file.
= cheese.dataloaders("working/which_cheese_cleaned", bs=32)
dls_better_tiny = vision_learner(dls_better_tiny, convnext_tiny, metrics=accuracy) learn
dls_better_tiny.bs
64
learn.lr_find()
SuggestedLRs(valley=0.0014454397605732083)
20,1.44e-3) learn.fine_tune(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.244600 | 2.932380 | 0.165854 | 00:15 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 3.287429 | 2.501591 | 0.287805 | 00:53 |
1 | 3.098157 | 2.106716 | 0.395122 | 00:53 |
2 | 2.910831 | 1.777798 | 0.512195 | 00:53 |
3 | 2.688967 | 1.506041 | 0.590244 | 00:53 |
4 | 2.420423 | 1.393299 | 0.653659 | 00:53 |
5 | 2.151483 | 1.200701 | 0.653659 | 00:53 |
6 | 1.887403 | 1.033586 | 0.697561 | 00:52 |
7 | 1.715113 | 0.973459 | 0.702439 | 00:53 |
8 | 1.539306 | 0.942053 | 0.702439 | 00:54 |
9 | 1.360577 | 0.849999 | 0.697561 | 00:53 |
10 | 1.227610 | 0.811377 | 0.746341 | 00:54 |
11 | 1.121673 | 0.766563 | 0.760976 | 00:53 |
12 | 1.017336 | 0.724510 | 0.785366 | 00:52 |
13 | 0.960135 | 0.694766 | 0.790244 | 00:52 |
14 | 0.924193 | 0.685870 | 0.790244 | 00:52 |
15 | 0.853204 | 0.688712 | 0.785366 | 00:53 |
16 | 0.816196 | 0.688337 | 0.795122 | 00:53 |
17 | 0.769849 | 0.678705 | 0.790244 | 00:53 |
18 | 0.743718 | 0.687633 | 0.804878 | 00:52 |
19 | 0.744009 | 0.674999 | 0.809756 | 00:52 |
"tiny.pkl") learn.export(
The tiny
model is not as good as the base model. However, the exported model is only 114MB. Still compared to good old resnet (47MB), that is more than twice the size.
8 Inference and getting ready for deployment
Let’s check if our models work in inferences.
We only test one image and do a visual inspection of the results. As already mentioned before, I did not provide a test set.
This is the biggest open TODO.
Another important aspect would be how certain the prediction is. How high is the probability for the second candidate? Many improvements are possible in problem definition and post-processing.
8.1 Comparison of three models
from fastcore.all import *
from fastai.vision.all import *
from fastai.learner import load_learner
# Load the FastAI Learner
= load_learner("models/tiny.pkl")
learn_inf_tiny = load_learner("models/base.pkl")
learn_inf_base= load_learner("models/resnet.pkl") learn_inf_resnet
"working/which_cheese_cleaned/which_cheese_first/which_cheese/Cantal/0c81aeec-c0a6-421e-844f-3e6e240885a8.jpg") learn_inf_tiny.predict(
('Cantal',
tensor(4),
tensor([9.7780e-05, 9.0306e-06, 2.1395e-05, 1.0606e-05, 9.9840e-01, 1.2682e-07,
4.7644e-04, 1.4753e-06, 9.9773e-06, 4.0509e-06, 2.3105e-05, 8.3267e-05,
8.9159e-05, 2.2647e-06, 3.8224e-06, 4.5492e-07, 2.8718e-04, 2.2553e-06,
7.6010e-07, 3.5029e-04, 2.3085e-07, 1.2567e-04]))
"working/which_cheese_cleaned/which_cheese_first/which_cheese/Cantal/0c81aeec-c0a6-421e-844f-3e6e240885a8.jpg") learn_inf_base.predict(
('Cantal',
tensor(4),
tensor([1.5877e-06, 5.6175e-05, 1.3185e-06, 4.0135e-06, 9.9739e-01, 1.9972e-06,
1.6469e-03, 1.1616e-05, 1.5650e-04, 2.0251e-05, 1.6810e-05, 3.3364e-04,
1.2042e-05, 1.8571e-06, 7.5011e-06, 5.5109e-07, 1.8472e-04, 2.0955e-06,
1.5077e-05, 8.5131e-05, 1.5477e-07, 4.9878e-05]))
"working/which_cheese_cleaned/which_cheese_first/which_cheese/Cantal/0c81aeec-c0a6-421e-844f-3e6e240885a8.jpg") learn_inf_resnet.predict(
('Cantal',
tensor(4),
tensor([1.2273e-06, 1.0518e-04, 2.1162e-05, 9.5564e-07, 9.9687e-01, 5.3281e-06,
2.2306e-03, 5.5073e-08, 9.4349e-05, 1.3493e-06, 2.2718e-04, 2.3397e-04,
8.0347e-06, 5.4313e-06, 4.4896e-06, 8.3956e-07, 3.2568e-05, 1.5521e-05,
2.5339e-06, 1.2194e-04, 1.5376e-05, 4.0610e-07]))
8.2 Comparison of ONNX and Pytorch
We’ll require an onnx
model at a later time. Let’s evaluate the resnet
model’s prediction accuracy.
!pip install onnx
import torch
= learn_inf_resnet.model
model = torch.randn(1, 3, 256, 256) # Use batch size 1 for export
dummy_input
torch.onnx.export(
model,
dummy_input, "model.onnx",
=True,
export_params=11,
opset_version=True,
do_constant_folding=["input"],
input_names=["output"],
output_names={"input": {0: "batch_size"}, "output": {0: "batch_size"}} # Allow variable batch size
dynamic_axes )
!pip install onnxruntime numpy pillow torchvision
= learn_inf_resnet.dls.vocab
class_names print(class_names) # List of class names
['Banon', 'Bleu d’Auvergne', 'Brie de Meaux', 'Camembert', 'Cantal', 'Chabichou du Poitou', 'Comté', 'Fourme d’Ambert', 'Gruyere', 'Livarot', 'Manchego', 'Mimolette', 'Munster', 'Neufchâtel', 'Pont-l’Évêque', 'Pélardon', 'Reblochon', 'Roquefort', 'Selles-sur-Cher', 'Tomme de Savoie', 'Valençay', 'Époisses de Bourgogne']
import onnxruntime as ort
import numpy as np
from PIL import Image
import torchvision.transforms as transforms
# Load ONNX model
= ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
session
# Preprocessing function
def preprocess_image(image_path):
= transforms.Compose([
transform 256, 256)),
transforms.Resize((
transforms.ToTensor(),=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
transforms.Normalize(mean
])= Image.open(image_path).convert("RGB")
image = transform(image).unsqueeze(0).numpy().astype(np.float32) # Add batch dim and convert to NumPy
image return image
# Load and preprocess image
= "working/which_cheese_cleaned/which_cheese_first/which_cheese/Cantal/0c81aeec-c0a6-421e-844f-3e6e240885a8.jpg" # Replace with your image path
image_path = preprocess_image(image_path)
input_tensor
# Run inference
= session.run(None, {"input": input_tensor})
outputs
import torch
= session.run(None, {"input": input_tensor})[0] # Raw logits
outputs = torch.nn.functional.softmax(torch.tensor(outputs), dim=1) # Convert to probabilities
probabilities = torch.argmax(probabilities, dim=1).item()
predicted_class # Get predicted class index and label
= np.argmax(probabilities)
predicted_idx = class_names[predicted_idx]
predicted_label
print(f"Predicted Class: {predicted_label} (Confidence: {probabilities[0][predicted_idx]:.6f})")
Predicted Class: Cantal (Confidence: 0.971699)
The prediction is correct, but the confidence is slightly different. We will try it anyway in deployment.
9 Deployment: Delivering an experience
Of course we want to share our model and not only by posting the source code on GitHub or hugging face. What we want is a live version of the model. Something users can experience.
You can deploy via cloud computing or on-device/edge computing. The used technologies are different.
9.1 Cloud based deployment
You cannot evaluate PyTorch ML models using simple JavaScript. A server runs a python backend, which provides an endpoint that is only doing the code of the previous section.
Here is a good tutorial how to get a simple setup running on hugging face with gradio.
I developed a webcam based app; the code is in the repo. And the app is live.
In the app you can select the convnext-base, convnext-tiny and the resnet model. All models are trained with 256px images. Just point the camera towards a cheese.
I used the Gradio
framework, popular in ML and featured on Hugging Face. The processing takes several milliseconds, despite being server based. The Gradio app offers no frame dropping. I try to include dynamic throttling to avoid frame congestion.
One thing I observed from this app is that the Convnext models have high numbers for the second and third best candidate. The app tries to predict a cheese when there is no cheese present. Two points worth to examine.
9.2 Edge based deployment
The issue with edge-based deployment is python. Python is by default not available on mobile. And because of secure concerns, it is becoming more and more complex to run a full blown linux with a python installation.
The other two ways are mobile apps and browser-based inference. We limit ourselves to browser based, because this is accessible via desktop and mobile.
ONNX format is necessary for web browser model deployment. At the end of this, we will evaluate if inferring onnx
gives us the same probability. Due to differences in preprocessing between fast.ai and manual methods, some variations may occur.
Using my knowledge of web development, I constructed a basic app that does inferring the resnet model.
My impressions were that the results were less good. But, the startup and inference time were acceptable and comparable to the python app. This is a point worth to examine once I have defined a proper test set.
10 The End
This was my first basic study in low level ML. I dabbled in pose recognition before and did work manage AI projects. I’m very impressed with the progress these tools have made.
If you have enjoyed the read, come back for the next project. We will revisit a recipe classification app, which I programmed in 2021 and which we will improve with AI.