Story Melange
  • Home/Blog
  • Technical Blog
  • Book Reviews
  • Projects
  • Archive
  • About

On this page

  • 1 Let’s start with the why
  • 2 Let’s continue with with the data.
    • 2.1 Getting data from DuckDuckGo
    • 2.2 Loading data from a Kaggle dataset
  • 3 Cleaning the data with the help of our first model
    • 3.1 Model definition
  • 4 Data Cleaning
    • 4.1 All the same?
    • 4.2 IMPORTANT: How to use the cleaner
  • 5 Fast iterations to improve to analyze the data
    • 5.1 Working with cleaned data
    • 5.2 Data Augmentation
    • 5.3 Label smoothing
    • 5.4 Summary
  • 6 Bigger is better
    • 6.1 Bigger images
    • 6.2 Bigger Model
    • 6.3 Even bigger images
    • 6.4 Label smoothing
  • 7 Modern model architectures
    • 7.1 Trying something smaller
  • 8 Inference and getting ready for deployment
    • 8.1 Comparison of three models
    • 8.2 Comparison of ONNX and Pytorch
  • 9 Deployment: Delivering an experience
    • 9.1 Cloud based deployment
    • 9.2 Edge based deployment
  • 10 The End
    • 10.1 Links

Which cheese are we eating?

machine learning
python
computer vision
Did you ever wonder what kind of cheese you should buy? They all look the same. And then the embarrasement when you can just point and say: that one. Meet the cheese classifier.
Author

Dominik Lindner

Published

March 13, 2025

1 Let’s start with the why

I love cheese. Sometimes it is quite difficult to distinguish the varieties. Think about the embarrasement when you are in front of a mountain of cheese and can only point with your finger.

Therefore, I decided to built a ML classifier to help me.

The special difficulty here is that cheeses all look quite similar. Take, for example, the swiss Gruyere and the French Comte.

They are twins.

2 Let’s continue with with the data.

First, we need some data. Fast.ai provides an easy download module to download images from DuckDuckGo.

As an alternative, we could use a dataset, if we have one. Let’s start by downloading the files and then create a dataset.

2.1 Getting data from DuckDuckGo

Let’s start by defining what we want to download. We want cheese. In particular, French cheese.

cheeses = [
    "Camembert",
    "Roquefort",
    "Comté",
    "Époisses de Bourgogne",
    "Tomme de Savoie",
    "Bleu d’Auvergne",
    "Brie de Meaux",
    "Mimolette",
    "Munster",
    "Livarot",
    "Pont-l’Évêque",
    "Reblochon",
    "Chabichou du Poitou",
    "Valençay",
    "Pélardon",
    "Fourme d’Ambert",
    "Selles-sur-Cher",
    "Cantal",
    "Neufchâtel",
    "Banon",
    "Gruyere"
]

To have a larger variety of images we define some extra search terms.

search_terms = [
    "cheese close-up texture",
    "cheese macro shot",
    "cheese cut section"
]

As we work with Fast.ai , let’s import the basic stuff.

from duckduckgo_search import DDGS
from fastcore.all import *
from fastai.vision.all import *
def search_images(keywords, max_images=20): return L(DDGS().images(keywords, max_results=max_images)).itemgot('image')
import time, json

And then define our download function:

from fastdownload import download_url
from pathlib import Path
import time

data_acquisition=False

def download():
    # Loop through all combinations of cheeses and search terms
    for cheese in cheeses:
        dest = Path("which_cheese") / cheese  # Create subdirectory for each cheese
        dest.mkdir(exist_ok=True, parents=True)

        for term in search_terms:
            query = f"{cheese} {term}"
            download_images(dest, urls=search_images(f"{query} photo"))
            time.sleep(5)

        # Resize images after downloading
        resize_images(dest, max_size=400, dest=dest)

# Run download only if data acquisition is enabled
if data_acquisition:
    download()

We can verify the images now or later.

if data_acquisition:
    failed = verify_images(get_image_files(path))
    failed.map(Path.unlink)
    len(failed)
    failed

2.2 Loading data from a Kaggle dataset

I created a dataset of these images to avoid having to download again when I start over.

Sadly to uncertain copyright issues of this data, my dataset needs to remain private. But you can easily create your own.

As I run most of my code locally, I have some code to get it from Kaggle

competition_name= None
dataset_name = 'cheese'

import os
from pathlib import Path

iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')
if competition_name:
    if iskaggle: 
        comp_path = Path('../input/'+ competition_name)
    else:
        comp_path = Path(competition_name)
        if not path.exists():
            import zipfile,kaggle
            kaggle.api.competition_download_cli(str(comp_path))
            zipfile.ZipFile(f'{comp_path}.zip').extractall(comp_path)


if dataset_name:
    if iskaggle:
        path = Path(f'../input/{dataset_name}')
    else:
        path = Path(dataset_name)
        if not path.exists():
            import zipfile, kaggle
            kaggle.api.dataset_download_cli(dataset_name, path='.')
            zipfile.ZipFile(f'{dataset_name}.zip').extractall(path)        

Now we have downloaded the data, we can start using it.

3 Cleaning the data with the help of our first model

Before we dive into different options for modelling, we will do a quick pass through the data and see which images are bad.

The background is that the scrapper picks up many images, which are not good for training.

We start by creating a working copy of the dataset.

!mkdir -p working/which_cheese_first 
!cp -r cheese/which_cheese  working/which_cheese_first 

To be sure that all images are valid, we check again for corrupeted files and remove them.

from pathlib import Path
from PIL import Image

data_path = Path("working/which_cheese_first")

# Check all images
corrupt_files = []
for img_path in data_path.rglob("*.*"):  # Match all files inside subfolders
    try:
        with Image.open(img_path) as img:
            img.verify()  # Verify if it's a valid image
    except (IOError, SyntaxError):
        corrupt_files.append(img_path)

# Remove corrupt images
print(f"Found {len(corrupt_files)} corrupt images.")
for corrupt in corrupt_files:
    print(f"Deleting {corrupt}")
    corrupt.unlink()  # Delete the file
Found 48 corrupt images.
Deleting working/which_cheese_first/which_cheese/Roquefort/350d3e67-dcf6-4292-b963-c1d5841b8788.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/594d40b1-f655-4db1-b3a9-4e7d6bb6c631.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/32a9069e-52c2-47e1-9db4-16197556c4fb.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/c73fb213-3813-43fd-b5ae-2d390ca8e3d5.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/2c426320-24bd-4869-8f1c-d09171ac6294.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/83a95414-4083-48d7-9956-be5d82b05caf.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/f4f09c62-652b-400c-8e09-419389635fc4.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/dfa07f3c-0931-49aa-b3c2-9c4a5901565d.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/609abf59-c1f0-4a34-b2cf-1bedf1b4cea0.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/b56ab8cc-5b37-40c9-be31-57d14c843978.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/422aec71-31d9-421e-880c-91867eaa5dfb.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/5591880b-37f4-4bcc-9927-8f60b6d6bb37.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/a9e2a7ad-038e-4b6d-8dee-19fd1661ebe1.jpg
Deleting working/which_cheese_first/which_cheese/Roquefort/4a572868-b982-47ed-b96e-3eb1a755e32a.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/8903d049-4256-4fe5-9716-48e5fc8ef52b.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/849c9bb0-b717-40a7-922e-091e22e36579.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/4582e06a-0218-4b6b-aeaf-7e7d61dd3827.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/bf950a7d-6ab2-4dd3-81dc-5ec14b9964dc.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/052b599d-f560-473c-947c-74bb3c138167.jpg
Deleting working/which_cheese_first/which_cheese/Camembert/ccf3f8e7-aa87-426d-bea2-a1f18a89be05.jpg
Deleting working/which_cheese_first/which_cheese/Manchego/f7de39d9-0ff2-4a99-aa92-807b27fa7d90.jpg
Deleting working/which_cheese_first/which_cheese/Manchego/0352de9a-3f83-4ce7-bfbe-207da04840a3.jpg
Deleting working/which_cheese_first/which_cheese/Manchego/0592b012-96e5-4f22-ac2e-acc8ab41ecc4.jpg
Deleting working/which_cheese_first/which_cheese/Fourme d’Ambert/0e36dc86-5e2a-4635-afcc-e3e0ec972aee.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/d827858f-aac0-49f4-b397-facadcfb70fb.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/bed8cf04-9305-4f00-9a8a-1b869e00701c.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/8963c142-9a63-43dc-8268-f54a1b6fbb2b.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/bbf224a3-5033-49c0-8b0c-92068e50382f.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/19d6e4f5-0393-455c-b2d4-7ecccfd93431.jpg
Deleting working/which_cheese_first/which_cheese/Neufchâtel/6ad1c9d8-1f29-4da8-ae7d-78915460cf35.jpg
Deleting working/which_cheese_first/which_cheese/Selles-sur-Cher/93d14546-21bf-46e5-89de-e336b474baf3.jpg
Deleting working/which_cheese_first/which_cheese/Mimolette/17abeba3-b113-4c84-90ed-b17b6152c71d.jpg
Deleting working/which_cheese_first/which_cheese/Mimolette/72f3db51-da86-4934-bad7-c1b5e54cfb46.jpg
Deleting working/which_cheese_first/which_cheese/Mimolette/16f74a99-f1ef-46fd-a809-8f332ad235b7.jpg
Deleting working/which_cheese_first/which_cheese/Époisses de Bourgogne/9484a03b-af27-4155-a950-bc07187f00f0.jpg
Deleting working/which_cheese_first/which_cheese/Livarot/ffe3e263-a49b-41bc-bdba-4b66cdc12475.jpg
Deleting working/which_cheese_first/which_cheese/Livarot/57e84bd7-8936-4d55-8ec4-cfcc1073b9a4.jpg
Deleting working/which_cheese_first/which_cheese/Gruyere/9316e837-a0b2-468a-a287-69ee27b840ba.jpg
Deleting working/which_cheese_first/which_cheese/Gruyere/dcc3320a-408c-4b93-a1b6-bf2f3f25aa15.jpg
Deleting working/which_cheese_first/which_cheese/Gruyere/3d5755ba-b8b3-4636-8213-54a3bf19613d.jpg
Deleting working/which_cheese_first/which_cheese/Comté/5b92cce2-46d4-46f8-9f0f-7952076ded0a.jpg
Deleting working/which_cheese_first/which_cheese/Reblochon/3ff8d8f8-09c4-4f83-85b8-9c089fcd6805.jpg
Deleting working/which_cheese_first/which_cheese/Pélardon/a0e86302-8ca3-47ab-ab4c-bdf4834ca208.jpg
Deleting working/which_cheese_first/which_cheese/Pélardon/6370f787-f13e-4acf-aefb-ad67f68d32c2.jpg
Deleting working/which_cheese_first/which_cheese/Pont-l’Évêque/4efc1ad3-575d-4a32-9063-e403fd57d7c9.jpg
Deleting working/which_cheese_first/which_cheese/Tomme de Savoie/5e23fd21-574a-47a0-bc9b-ca52984ae9a5.jpg
Deleting working/which_cheese_first/which_cheese/Tomme de Savoie/5cf2571c-74f7-4a5b-9374-ef4c480267df.jpg
Deleting working/which_cheese_first/which_cheese/Valençay/a45d3613-5dbc-45b5-85c0-2ead70ccf221.jpg

3.1 Model definition

We will define a simple model and check if the data is loaded correctly. The most simple model for image classification is resnet18.

from fastcore.all import *
from fastai.vision.all import *
cheese = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(192, method='squish')]
)
dls = cheese.dataloaders("working/which_cheese_first")
dls.show_batch()

For the metrics, I chose accuracy as this is the most easy to analyze. We later see that the dataset becomes slightly imbalanced in training and F1-score would be better.

learn = vision_learner(dls, resnet18, metrics=accuracy)

We then do a quick learning pass.

learn.fine_tune(3)
epoch train_loss valid_loss accuracy time
0 4.307302 2.287525 0.356164 00:02
epoch train_loss valid_loss accuracy time
0 2.265255 1.649305 0.547945 00:03
1 1.552460 1.265489 0.662100 00:03
2 1.129812 1.213783 0.666667 00:03

As we can see, accuracy increased to 66% after 3 epochs.

4 Data Cleaning

We can have a look at the confusion matrix. There are some cheeses that are easily confused with each other. For example Bleu d’Auvergne with Fourme d’Ambert. In fact, in cheese stores outside France, few people seem to know the second one. But also the hard cheeses, Cantal, Comte, and Gruyere. The last two are two standard mountain cheeses, one from France and the other from Switzerland. The only differ by their texture. Comte of the same age are a little creamier and have fewer crevices. I especially added the Gruyere to make the dataset harder.

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

Let’s have a look at the top losses.

interp.plot_top_losses(10)

4.1 All the same?

As expected, similar cheese from the same group is difficult to distinguish.

Let’s do some data cleaning.

For the Comte, Gruyere, Munster: pictures with the highest loss are those with little detail or other accessories like bread or knifes.

from fastai.vision.widgets import *
files_to_clean=[]
cleaner = ImageClassifierCleaner(learn)
cleaner

4.2 IMPORTANT: How to use the cleaner

For each category and train & valid sets, select the images and then run the following cell. It seems the cleaner doesn’t remember the selections in other categories.

We can also not run the above cell multiple times after we cleaned some files, as those will be missing. Instead, we go through all categories and collect files to be deleted.

We do not change categories for now.

for idx in cleaner.delete(): 
    files_to_clean.append(cleaner.fns[idx])
for file in files_to_clean:
    try:
        file.unlink()
    except:
        pass

After a lot of examination I cleaned my dataset from 1100 files to 1029. I have run the following cells to create a copy of the cleaned data. For protection of the data, this cell is commented.

#!mkdir -p working/which_cheese_cleaned
#!cp -r working/which_cheese_first  working/which_cheese_cleaned

5 Fast iterations to improve to analyze the data

5.1 Working with cleaned data

Now we have cleaned some data, we can train again, using more advanced techniques.

We will start by a simple training again, to see if the cleaning was successful.

from fastcore.all import *
from fastai.vision.all import *
cheese = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(192, method='squish')]
)
dls = cheese.dataloaders("working/which_cheese_cleaned")
learn = vision_learner(dls, resnet18, metrics=accuracy)
learn.fine_tune(3)
epoch train_loss valid_loss accuracy time
0 4.137223 2.343409 0.326829 00:02
epoch train_loss valid_loss accuracy time
0 2.324481 1.571146 0.570732 00:02
1 1.581721 1.212596 0.643902 00:02
2 1.146727 1.148679 0.678049 00:02

We now roughly archieve 68% accuracy. Let’s train further to see how far when can get.

learn.fine_tune(13)
epoch train_loss valid_loss accuracy time
0 0.523904 1.073870 0.717073 00:02
epoch train_loss valid_loss accuracy time
0 0.314336 1.042765 0.721951 00:02
1 0.260081 0.991794 0.741463 00:02
2 0.203073 0.943358 0.741463 00:02
3 0.158532 0.913470 0.756098 00:02
4 0.141772 0.872876 0.751220 00:02
5 0.121437 0.816914 0.751220 00:02
6 0.101683 0.836497 0.765854 00:02
7 0.085780 0.845604 0.751220 00:02
8 0.071734 0.842247 0.760976 00:02
9 0.062432 0.823996 0.765854 00:02
10 0.053119 0.811724 0.760976 00:02
11 0.044131 0.817966 0.760976 00:02
12 0.038983 0.820316 0.760976 00:02

We seem to have hit a wall at 76% accuracy as early as iteration 6.

5.1.1 A word on the choice of metrics

Earlier I chose accuracy as the metric. Let’s examine our data to see if the choice is still valid.

pd.Series([dls.vocab[o[1]] for o in dls.train_ds]).value_counts()
Fourme d’Ambert          48
Chabichou du Poitou      47
Mimolette                44
Pont-l’Évêque            44
Brie de Meaux            43
Comté                    41
Tomme de Savoie          41
Cantal                   40
Pélardon                 40
Reblochon                39
Valençay                 39
Bleu d’Auvergne          38
Neufchâtel               37
Livarot                  36
Selles-sur-Cher          35
Camembert                34
Époisses de Bourgogne    33
Manchego                 32
Munster                  32
Gruyere                  29
Roquefort                27
Banon                    24
Name: count, dtype: int64

As I mentioned earlier, the dataset is no longer balanced. However, it is also not imbalanced, as the imbalance is 2:1 and not 1:10, an order of magnitude. We stick with accuracy.

5.2 Data Augmentation

We do not have many images in the data. Therefore, we will use data augmentation and move from squishing to RandomResizedCrop.

cheese_augmented = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=RandomResizedCrop(192, min_scale=0.3),
    batch_tfms=aug_transforms(mult=2))

dls = cheese_augmented.dataloaders("working/which_cheese_cleaned")

Note: I choose here to override the variables. A standard programming approach would use new variables. However, the learner reserves memory on the GPU. We will hit an out of memory error. One option is to delete the previous variable and free up the memory. The other option, which I chose here, is to override it with a new learner. This override implicitly deleted the old learner.

learn = vision_learner(dls, resnet18, metrics=accuracy)

We will pull another trick and use a better learning rate.

learn.lr_find()
SuggestedLRs(valley=0.0014454397605732083)

learn.fine_tune(16, 1.44e-3)
epoch train_loss valid_loss accuracy time
0 4.298504 2.772096 0.278049 00:02
epoch train_loss valid_loss accuracy time
0 3.559263 2.476911 0.331707 00:02
1 3.395612 2.182159 0.370732 00:02
2 3.096542 1.812795 0.492683 00:02
3 2.789779 1.489513 0.570732 00:02
4 2.507986 1.255989 0.629268 00:02
5 2.255753 1.103503 0.687805 00:02
6 1.996111 1.050033 0.726829 00:02
7 1.788129 0.995375 0.741463 00:02
8 1.612162 0.972283 0.741463 00:02
9 1.448160 0.921064 0.736585 00:02
10 1.303951 0.902030 0.751220 00:02
11 1.197405 0.879721 0.765854 00:02
12 1.133353 0.869218 0.760976 00:02
13 1.073917 0.860385 0.775610 00:02
14 1.003178 0.850505 0.770732 00:02
15 0.972047 0.851660 0.775610 00:02
learn.fine_tune(6, 1.44e-3)
epoch train_loss valid_loss accuracy time
0 0.824386 0.848143 0.780488 00:02
epoch train_loss valid_loss accuracy time
0 0.778520 0.854400 0.780488 00:02
1 0.784403 0.840095 0.770732 00:02
2 0.752580 0.833080 0.760976 00:02
3 0.711079 0.824473 0.775610 00:02
4 0.661186 0.804503 0.760976 00:02
5 0.619942 0.800512 0.765854 00:02
learn.fine_tune(6, 1.44e-3)
epoch train_loss valid_loss accuracy time
0 0.536938 0.783610 0.775610 00:02
epoch train_loss valid_loss accuracy time
0 0.553907 0.786242 0.775610 00:02
1 0.547560 0.848543 0.765854 00:02
2 0.524134 0.848381 0.756098 00:02
3 0.499666 0.811153 0.780488 00:02
4 0.473074 0.783625 0.780488 00:02
5 0.465626 0.781733 0.785366 00:02

The training advanced more slowly. It seems to have hit the same block at 76%, 77%. Only after 12 more iterations, we seem to have converged on a path with over 78%. The Validation loss is only going down after 6 iterations, showing convergence issues of the gradient descent.

Let’s look at the solution

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

interp.plot_top_losses(5, nrows=1)

The fourme d'ambert uncertainty has almost vanished. The top losses are from images that have slipped my cleaning efforts and are indeeed misleading.

5.3 Label smoothing

As the data still has a lot of noise, we can try label smoothing. Labelsmoothing assumes a natural uncertainty and no label can have 100%. Instead, Label smoothing redistributes a small portion of the correct class’s probability across all classes to prevent overconfidence and improve generalization.

We will start with 28 iterations.

from fastai.losses import LabelSmoothingCrossEntropy

learn = vision_learner(dls, resnet18, metrics=accuracy, loss_func=LabelSmoothingCrossEntropy())
learn.lr_find()
SuggestedLRs(valley=0.00363078061491251)

learn.fine_tune(28, 3.6e-3)
epoch train_loss valid_loss accuracy time
0 4.273853 2.742732 0.307317 00:02
epoch train_loss valid_loss accuracy time
0 2.933313 2.152079 0.429268 00:02
1 2.757924 1.957854 0.531707 00:02
2 2.600373 1.795183 0.595122 00:02
3 2.414058 1.652869 0.648780 00:02
4 2.251830 1.555603 0.692683 00:02
5 2.109990 1.569267 0.702439 00:02
6 1.985083 1.525316 0.717073 00:02
7 1.878232 1.548784 0.717073 00:02
8 1.782599 1.508083 0.726829 00:02
9 1.692580 1.468358 0.746341 00:02
10 1.610464 1.430262 0.741463 00:02
11 1.544696 1.419962 0.721951 00:02
12 1.477807 1.413017 0.751220 00:02
13 1.419630 1.305687 0.760976 00:02
14 1.370959 1.298595 0.795122 00:02
15 1.320189 1.298479 0.814634 00:02
16 1.275702 1.271670 0.785366 00:02
17 1.247922 1.282414 0.770732 00:02
18 1.205016 1.259176 0.775610 00:02
19 1.168169 1.248492 0.775610 00:02
20 1.135880 1.244297 0.780488 00:02
21 1.108426 1.244742 0.785366 00:02
22 1.088607 1.238094 0.775610 00:02
23 1.073189 1.241032 0.780488 00:02
24 1.053456 1.239150 0.775610 00:02
25 1.044365 1.244962 0.780488 00:02
26 1.034709 1.242740 0.780488 00:02
27 1.029999 1.233360 0.790244 00:02

Starting at iteration 13, issues emerged with the loss function. We observed an accuracy of approximately 81%. Yet, our accuracy is just 79%, despite the reduced loss

5.4 Summary

Data augmentation and Label Smoothing both help with very noisy data and a low amount of samples. We got the accuracy from 76% to 81%.

6 Bigger is better

So they say in mechanical engineering.

Let’s try improvements for size.

6.1 Bigger images

First, we increase the images.

from fastcore.all import *
from fastai.vision.all import *
cheese = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=RandomResizedCrop(256, min_scale=0.3),
    batch_tfms=aug_transforms(mult=2))

dls = cheese.dataloaders("working/which_cheese_cleaned")
learn_better = vision_learner(dls, resnet18, metrics=accuracy)
learn_better.lr_find()
SuggestedLRs(valley=0.0008317637839354575)

6.1.1 Note: Beware CUDA out of memory

As we increase the size of the data and the model we can run of memory. After the crash, the memory stays allocated.

The standard approach is to run torch.cuda.empty_cache() and run garbage collection..

Sometimes, the memory still keeps being allocated and i need multiple passes to free up the memory. I wrote a utility function to do just that.

As I use an old GPU with only 8GB, I frequently run in the out-of-memory error.

def free_cuda_memory(var_name, globals_dict, max_attempts=5, delay=0.5):
    """
    Deletes a variable by name, collects garbage, and repeatedly clears CUDA memory until freed.
    
    Args:
        var_name (str): Name of the variable to delete.
        globals_dict (dict): Pass `globals()` to delete from the global scope.
        max_attempts (int): Maximum attempts to clear memory.
        delay (float): Time (in seconds) to wait between attempts.
    """
    import torch
    import gc
    import time
    if var_name in globals_dict:
        del globals_dict[var_name]
    else:
        print(f"Variable '{var_name}' not found in globals.")
        return

    for _ in range(max_attempts):
        gc.collect()
        torch.cuda.empty_cache()
        torch.cuda.ipc_collect()
        time.sleep(delay)

        # Check if memory is freed
        allocated = torch.cuda.memory_allocated()
        cached = torch.cuda.memory_reserved()

        if allocated == 0 and cached == 0:
            print("CUDA memory successfully freed.")
            return
    
    print("Warning: Some CUDA memory may still be blocked.")
    print(f"Allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
    print(f"Cached: {torch.cuda.memory_reserved() / 1e9:.2f} GB")
free_cuda_memory("learn_better",globals())
Variable 'learn_better' not found in globals.
learn_better.fine_tune(20, 8.3e-4)
epoch train_loss valid_loss accuracy time
0 4.542337 3.085513 0.146341 00:02
epoch train_loss valid_loss accuracy time
0 4.000002 2.851287 0.180488 00:03
1 3.854214 2.640312 0.214634 00:03
2 3.678350 2.329466 0.321951 00:03
3 3.460999 1.959772 0.409756 00:03
4 3.187331 1.625292 0.536585 00:03
5 2.932109 1.408548 0.604878 00:03
6 2.674737 1.244989 0.668293 00:03
7 2.424846 1.146155 0.663415 00:03
8 2.201131 1.025524 0.707317 00:03
9 2.034413 0.931238 0.726829 00:03
10 1.865840 0.851306 0.756098 00:03
11 1.716559 0.824157 0.741463 00:03
12 1.578321 0.804028 0.770732 00:03
13 1.461851 0.793212 0.775610 00:03
14 1.359122 0.781659 0.795122 00:03
15 1.279223 0.774161 0.795122 00:03
16 1.229434 0.775929 0.795122 00:03
17 1.166556 0.768999 0.795122 00:03
18 1.123601 0.767946 0.800000 00:03
19 1.104936 0.771056 0.790244 00:03
learn_better.export('resnet.pkl')

Almost 80%. After 20 epochs, the goal seems to have been reached. In another run I had 82%. Despite the lack of consistency, I count this as a record.

6.2 Bigger Model

Instead of the images we can increase the model, we will go for resnet34 and resnet50.

cheese = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=RandomResizedCrop(192, min_scale=0.3),
    batch_tfms=aug_transforms(mult=2))

dls = cheese.dataloaders("working/which_cheese_cleaned")
learn_better = vision_learner(dls, resnet34, metrics=accuracy)
learn_better.lr_find()
SuggestedLRs(valley=0.0020892962347716093)

learn_better.fine_tune(20, 2e-3)
epoch train_loss valid_loss accuracy time
0 4.319314 2.530106 0.243902 00:02
epoch train_loss valid_loss accuracy time
0 3.351670 2.044574 0.380488 00:03
1 3.058213 1.685639 0.507317 00:03
2 2.746131 1.313843 0.639024 00:03
3 2.436243 1.013334 0.731707 00:03
4 2.142230 0.840266 0.775610 00:03
5 1.898090 0.805258 0.770732 00:03
6 1.671135 0.764192 0.800000 00:03
7 1.478277 0.738444 0.809756 00:03
8 1.305836 0.683891 0.785366 00:03
9 1.142530 0.632159 0.790244 00:03
10 1.006283 0.622701 0.814634 00:03
11 0.882134 0.641913 0.790244 00:03
12 0.775799 0.630769 0.780488 00:03
13 0.712351 0.629713 0.790244 00:03
14 0.638866 0.643542 0.790244 00:03
15 0.576997 0.637436 0.790244 00:03
16 0.545079 0.637268 0.804878 00:03
17 0.505899 0.642498 0.804878 00:03
18 0.472322 0.642579 0.800000 00:03
19 0.449359 0.632896 0.804878 00:03
learn_better = vision_learner(dls, resnet50, metrics=accuracy)
learn_better.lr_find()
SuggestedLRs(valley=0.0010000000474974513)

learn_better.fine_tune(20, 1e-3)
epoch train_loss valid_loss accuracy time
0 4.393582 2.744330 0.204878 00:04
epoch train_loss valid_loss accuracy time
0 3.321695 2.486286 0.287805 00:06
1 3.125788 2.221527 0.356098 00:06
2 2.849939 1.930392 0.439024 00:06
3 2.639139 1.629388 0.502439 00:06
4 2.387037 1.396374 0.600000 00:06
5 2.145477 1.242509 0.648780 00:06
6 1.933496 1.132375 0.687805 00:06
7 1.738567 1.025986 0.717073 00:06
8 1.557685 0.977237 0.756098 00:06
9 1.412416 0.930118 0.765854 00:06
10 1.275174 0.908866 0.756098 00:06
11 1.159127 0.913954 0.765854 00:06
12 1.066809 0.898841 0.775610 00:06
13 0.990926 0.876705 0.780488 00:06
14 0.921643 0.868463 0.785366 00:06
15 0.874949 0.841853 0.780488 00:06
16 0.826003 0.845446 0.765854 00:06
17 0.782675 0.861964 0.765854 00:06
18 0.737917 0.838224 0.770732 00:06
19 0.717874 0.844599 0.765854 00:06

Remarkably the bigger model resnet34 also can achieve 81% and the training is better converging. Conversely, the even larger ResNet50 model yields inferior results. This could be due to the limited amount of data.

Let’s see how the big model handles big images.

cheese = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=RandomResizedCrop(256, min_scale=0.3),
    batch_tfms=aug_transforms(mult=2))

dls = cheese.dataloaders("working/which_cheese_cleaned")
learn_better = vision_learner(dls, resnet34, metrics=accuracy)
learn_better.lr_find()
SuggestedLRs(valley=0.0006918309954926372)

learn_better.fine_tune(20, 6.9e-4)
epoch train_loss valid_loss accuracy time
0 4.543718 3.211622 0.092683 00:04
epoch train_loss valid_loss accuracy time
0 4.034224 2.881157 0.180488 00:05
1 3.940435 2.636389 0.229268 00:05
2 3.715499 2.337412 0.326829 00:05
3 3.486756 1.973952 0.429268 00:05
4 3.248124 1.657461 0.507317 00:05
5 2.996700 1.410418 0.595122 00:05
6 2.718238 1.225778 0.673171 00:05
7 2.460284 1.119877 0.697561 00:05
8 2.235554 1.047230 0.692683 00:05
9 2.015765 0.982388 0.717073 00:05
10 1.827305 0.939780 0.726829 00:05
11 1.672784 0.906191 0.741463 00:05
12 1.559439 0.882196 0.756098 00:05
13 1.426843 0.865841 0.765854 00:05
14 1.338417 0.850578 0.765854 00:05
15 1.243912 0.847275 0.746341 00:05
16 1.178589 0.844610 0.751220 00:05
17 1.117920 0.846011 0.746341 00:05
18 1.063974 0.839167 0.751220 00:05
19 1.025944 0.836000 0.765854 00:05

First, it needs to be noted that the learning rate is lower. But also the results are worse than for resnet18.

6.3 Even bigger images

We will increase the images even further. We resized our images to 400px, so there is no point in going larger than 312px.

cheese = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=RandomResizedCrop(312, min_scale=0.3),
    batch_tfms=aug_transforms(mult=2))

dls = cheese.dataloaders("working/which_cheese_cleaned")
learn_better = vision_learner(dls, resnet34, metrics=accuracy)
learn_better.lr_find()
SuggestedLRs(valley=0.0010000000474974513)

learn_better.fine_tune(20, 1e-3)
epoch train_loss valid_loss accuracy time
0 4.543712 3.039113 0.146341 00:06
epoch train_loss valid_loss accuracy time
0 3.761286 2.706118 0.234146 00:07
1 3.621768 2.441219 0.297561 00:07
2 3.425048 2.090192 0.409756 00:08
3 3.118869 1.713392 0.507317 00:07
4 2.836166 1.382881 0.585366 00:07
5 2.560673 1.180610 0.629268 00:07
6 2.283780 1.015514 0.697561 00:07
7 2.033190 0.904861 0.736585 00:07
8 1.819672 0.865400 0.731707 00:07
9 1.640550 0.837150 0.736585 00:07
10 1.489337 0.794789 0.765854 00:07
11 1.336533 0.753989 0.770732 00:07
12 1.211884 0.724643 0.775610 00:07
13 1.124595 0.710352 0.790244 00:07
14 1.030781 0.710033 0.785366 00:07
15 0.960203 0.696822 0.790244 00:07
16 0.899652 0.694591 0.795122 00:07
17 0.841474 0.690466 0.809756 00:07
18 0.791879 0.689909 0.795122 00:07
19 0.763160 0.689728 0.800000 00:07
cheese = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=RandomResizedCrop(312, min_scale=0.3),
    batch_tfms=aug_transforms(mult=2))

dls = cheese.dataloaders("working/which_cheese_cleaned")
learn_better = vision_learner(dls, resnet18, metrics=accuracy)
learn_better.lr_find()
SuggestedLRs(valley=0.0010000000474974513)

learn_better.fine_tune(20, 1e-3)
epoch train_loss valid_loss accuracy time
0 4.559435 2.919010 0.170732 00:04
epoch train_loss valid_loss accuracy time
0 3.794993 2.674616 0.239024 00:05
1 3.646426 2.417594 0.307317 00:05
2 3.475905 2.064787 0.375610 00:05
3 3.235945 1.699186 0.502439 00:05
4 2.987770 1.406921 0.560976 00:05
5 2.700939 1.190468 0.643902 00:05
6 2.444276 1.058556 0.712195 00:05
7 2.220634 0.985898 0.726829 00:05
8 2.032653 0.906098 0.760976 00:05
9 1.847860 0.848331 0.765854 00:05
10 1.679043 0.809622 0.746341 00:05
11 1.530006 0.772010 0.736585 00:05
12 1.410453 0.746380 0.765854 00:05
13 1.309522 0.737125 0.780488 00:05
14 1.227750 0.726092 0.790244 00:05
15 1.161594 0.708379 0.785366 00:05
16 1.102717 0.702760 0.775610 00:05
17 1.054643 0.707285 0.785366 00:05
18 1.007804 0.689951 0.790244 00:05
19 0.993321 0.692554 0.785366 00:05

Interestingly, the even bigger size brings the bigger model to a slight advantage, but not much.

It could be that the bigger model has a better capacity to learn from more data. Whereas the smaller model generalizes better on a smaller dataset.

Research has also shown this: https://en.wikipedia.org/wiki/Neural_scaling_law.

Larger models often perform better with more data because of their capacity to learn complex patterns, while smaller models may generalize better on smaller datasets, reducing overfitting.

6.4 Label smoothing

We will try the size together with the label smoothing.

cheese = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=RandomResizedCrop(312, min_scale=0.3),
    batch_tfms=aug_transforms(mult=2))
dls = cheese.dataloaders("working/which_cheese_cleaned")
learn_better = vision_learner(dls, resnet18, metrics=accuracy, loss_func=LabelSmoothingCrossEntropy())
learn_better.lr_find()
SuggestedLRs(valley=0.0014454397605732083)

learn_better.fine_tune(20, 1.4e-3)
epoch train_loss valid_loss accuracy time
0 4.394593 2.871440 0.229268 00:04
epoch train_loss valid_loss accuracy time
0 3.653049 2.588406 0.317073 00:05
1 3.530752 2.326544 0.365854 00:05
2 3.368824 2.011567 0.502439 00:05
3 3.158019 1.743854 0.590244 00:05
4 2.937979 1.583111 0.668293 00:05
5 2.703008 1.478423 0.707317 00:05
6 2.499692 1.461678 0.785366 00:05
7 2.328650 1.402460 0.770732 00:05
8 2.176972 1.380992 0.760976 00:05
9 2.045058 1.355799 0.751220 00:05
10 1.940953 1.329172 0.780488 00:05
11 1.850078 1.313228 0.809756 00:05
12 1.770922 1.311223 0.809756 00:05
13 1.706659 1.292058 0.819512 00:05
14 1.641106 1.296595 0.800000 00:05
15 1.590719 1.295866 0.814634 00:05
16 1.561213 1.287696 0.800000 00:05
17 1.523102 1.286888 0.809756 00:05
18 1.498262 1.279946 0.809756 00:05
19 1.474353 1.274973 0.814634 00:05

Sadly the LabelSmoothing only improved the convergence. The final score is not better than the initial try with bigger images.

7 Modern model architectures

Resnet is quite dated. A newer model, ConvNext is reported to deliver better results.

We will start with the base variant of the model. Due to its size, we need to limit the batch size to 16. Currently, I found no other way than trial and error to determine the batch size.

During the discovery of the correct batch size, I multiple times hit the memory ceiling. My free_cuda_memory function came in handy.

from fastcore.all import *
from fastai.vision.all import *
cheese = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=RandomResizedCrop(256, min_scale=0.3),
    batch_tfms=aug_transforms(mult=2))

dls = cheese.dataloaders("working/which_cheese_cleaned", bs=16)
learn = vision_learner(dls, convnext_base, metrics=accuracy)
learn.lr_find()
SuggestedLRs(valley=0.0010000000474974513)

learn.fine_tune(20, 1.e-3)
epoch train_loss valid_loss accuracy time
0 4.031354 2.283592 0.321951 00:37
epoch train_loss valid_loss accuracy time
0 2.887947 2.005787 0.385366 01:28
1 2.703871 1.712844 0.492683 01:28
2 2.446322 1.513336 0.585366 01:27
3 2.149493 1.298572 0.614634 01:30
4 1.878358 0.989779 0.712195 01:29
5 1.694279 0.902841 0.731707 01:29
6 1.502565 0.791859 0.775610 01:30
7 1.361015 0.699819 0.795122 01:30
8 1.272284 0.713360 0.809756 01:30
9 1.163435 0.629532 0.804878 01:29
10 1.010350 0.623500 0.829268 01:30
11 0.907372 0.668074 0.785366 01:30
12 0.915707 0.625355 0.804878 01:29
13 0.833985 0.539842 0.829268 01:30
14 0.772828 0.531026 0.824390 01:30
15 0.735248 0.518410 0.824390 01:30
16 0.715735 0.510863 0.819512 01:30
17 0.726851 0.512946 0.829268 01:30
18 0.732415 0.508770 0.824390 01:30
19 0.737635 0.505515 0.829268 01:29

The convnext-base model reached 83% already after 10 iterations. Afterwards, the loss improved, but accuracy did not. However, the model is big with 350mb. We will save it for later.

learn.export('convnext_base.pkl')

7.1 Trying something smaller

There is also a convnext-tiny model, which should produce a smaller model file.

dls_better_tiny = cheese.dataloaders("working/which_cheese_cleaned", bs=32)
learn = vision_learner(dls_better_tiny, convnext_tiny, metrics=accuracy)
dls_better_tiny.bs
64
learn.lr_find()
SuggestedLRs(valley=0.0014454397605732083)

learn.fine_tune(20,1.44e-3)
epoch train_loss valid_loss accuracy time
0 4.244600 2.932380 0.165854 00:15
epoch train_loss valid_loss accuracy time
0 3.287429 2.501591 0.287805 00:53
1 3.098157 2.106716 0.395122 00:53
2 2.910831 1.777798 0.512195 00:53
3 2.688967 1.506041 0.590244 00:53
4 2.420423 1.393299 0.653659 00:53
5 2.151483 1.200701 0.653659 00:53
6 1.887403 1.033586 0.697561 00:52
7 1.715113 0.973459 0.702439 00:53
8 1.539306 0.942053 0.702439 00:54
9 1.360577 0.849999 0.697561 00:53
10 1.227610 0.811377 0.746341 00:54
11 1.121673 0.766563 0.760976 00:53
12 1.017336 0.724510 0.785366 00:52
13 0.960135 0.694766 0.790244 00:52
14 0.924193 0.685870 0.790244 00:52
15 0.853204 0.688712 0.785366 00:53
16 0.816196 0.688337 0.795122 00:53
17 0.769849 0.678705 0.790244 00:53
18 0.743718 0.687633 0.804878 00:52
19 0.744009 0.674999 0.809756 00:52
learn.export("tiny.pkl")

The tiny model is not as good as the base model. However, the exported model is only 114MB. Still compared to good old resnet (47MB), that is more than twice the size.

8 Inference and getting ready for deployment

Let’s check if our models work in inferences.

We only test one image and do a visual inspection of the results. As already mentioned before, I did not provide a test set.

This is the biggest open TODO.

Another important aspect would be how certain the prediction is. How high is the probability for the second candidate? Many improvements are possible in problem definition and post-processing.

8.1 Comparison of three models

from fastcore.all import *
from fastai.vision.all import *
from fastai.learner import load_learner

# Load the FastAI Learner
learn_inf_tiny = load_learner("models/tiny.pkl")
learn_inf_base= load_learner("models/base.pkl")
learn_inf_resnet = load_learner("models/resnet.pkl")
learn_inf_tiny.predict("working/which_cheese_cleaned/which_cheese_first/which_cheese/Cantal/0c81aeec-c0a6-421e-844f-3e6e240885a8.jpg")
('Cantal',
 tensor(4),
 tensor([9.7780e-05, 9.0306e-06, 2.1395e-05, 1.0606e-05, 9.9840e-01, 1.2682e-07,
         4.7644e-04, 1.4753e-06, 9.9773e-06, 4.0509e-06, 2.3105e-05, 8.3267e-05,
         8.9159e-05, 2.2647e-06, 3.8224e-06, 4.5492e-07, 2.8718e-04, 2.2553e-06,
         7.6010e-07, 3.5029e-04, 2.3085e-07, 1.2567e-04]))
learn_inf_base.predict("working/which_cheese_cleaned/which_cheese_first/which_cheese/Cantal/0c81aeec-c0a6-421e-844f-3e6e240885a8.jpg")
('Cantal',
 tensor(4),
 tensor([1.5877e-06, 5.6175e-05, 1.3185e-06, 4.0135e-06, 9.9739e-01, 1.9972e-06,
         1.6469e-03, 1.1616e-05, 1.5650e-04, 2.0251e-05, 1.6810e-05, 3.3364e-04,
         1.2042e-05, 1.8571e-06, 7.5011e-06, 5.5109e-07, 1.8472e-04, 2.0955e-06,
         1.5077e-05, 8.5131e-05, 1.5477e-07, 4.9878e-05]))
learn_inf_resnet.predict("working/which_cheese_cleaned/which_cheese_first/which_cheese/Cantal/0c81aeec-c0a6-421e-844f-3e6e240885a8.jpg")
('Cantal',
 tensor(4),
 tensor([1.2273e-06, 1.0518e-04, 2.1162e-05, 9.5564e-07, 9.9687e-01, 5.3281e-06,
         2.2306e-03, 5.5073e-08, 9.4349e-05, 1.3493e-06, 2.2718e-04, 2.3397e-04,
         8.0347e-06, 5.4313e-06, 4.4896e-06, 8.3956e-07, 3.2568e-05, 1.5521e-05,
         2.5339e-06, 1.2194e-04, 1.5376e-05, 4.0610e-07]))

8.2 Comparison of ONNX and Pytorch

We’ll require an onnx model at a later time. Let’s evaluate the resnet model’s prediction accuracy.

!pip install onnx
import torch
model = learn_inf_resnet.model
dummy_input = torch.randn(1, 3, 256, 256)  # Use batch size 1 for export
torch.onnx.export(
    model, 
    dummy_input, 
    "model.onnx", 
    export_params=True, 
    opset_version=11, 
    do_constant_folding=True, 
    input_names=["input"], 
    output_names=["output"], 
    dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}}  # Allow variable batch size
)
!pip install onnxruntime numpy pillow torchvision
class_names = learn_inf_resnet.dls.vocab
print(class_names)  # List of class names
['Banon', 'Bleu d’Auvergne', 'Brie de Meaux', 'Camembert', 'Cantal', 'Chabichou du Poitou', 'Comté', 'Fourme d’Ambert', 'Gruyere', 'Livarot', 'Manchego', 'Mimolette', 'Munster', 'Neufchâtel', 'Pont-l’Évêque', 'Pélardon', 'Reblochon', 'Roquefort', 'Selles-sur-Cher', 'Tomme de Savoie', 'Valençay', 'Époisses de Bourgogne']
import onnxruntime as ort
import numpy as np
from PIL import Image
import torchvision.transforms as transforms
# Load ONNX model
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])

# Preprocessing function
def preprocess_image(image_path):
    transform = transforms.Compose([
        transforms.Resize((256, 256)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    image = Image.open(image_path).convert("RGB")
    image = transform(image).unsqueeze(0).numpy().astype(np.float32)  # Add batch dim and convert to NumPy
    return image

# Load and preprocess image
image_path = "working/which_cheese_cleaned/which_cheese_first/which_cheese/Cantal/0c81aeec-c0a6-421e-844f-3e6e240885a8.jpg"  # Replace with your image path
input_tensor = preprocess_image(image_path)

# Run inference
outputs = session.run(None, {"input": input_tensor})


import torch
outputs = session.run(None, {"input": input_tensor})[0]  # Raw logits
probabilities = torch.nn.functional.softmax(torch.tensor(outputs), dim=1)  # Convert to probabilities
predicted_class = torch.argmax(probabilities, dim=1).item()
# Get predicted class index and label
predicted_idx = np.argmax(probabilities)
predicted_label = class_names[predicted_idx]

print(f"Predicted Class: {predicted_label} (Confidence: {probabilities[0][predicted_idx]:.6f})")
Predicted Class: Cantal (Confidence: 0.971699)

The prediction is correct, but the confidence is slightly different. We will try it anyway in deployment.

9 Deployment: Delivering an experience

Of course we want to share our model and not only by posting the source code on GitHub or hugging face. What we want is a live version of the model. Something users can experience.

You can deploy via cloud computing or on-device/edge computing. The used technologies are different.

9.1 Cloud based deployment

You cannot evaluate PyTorch ML models using simple JavaScript. A server runs a python backend, which provides an endpoint that is only doing the code of the previous section.

Here is a good tutorial how to get a simple setup running on hugging face with gradio.

I developed a webcam based app; the code is in the repo. And the app is live.

In the app you can select the convnext-base, convnext-tiny and the resnet model. All models are trained with 256px images. Just point the camera towards a cheese.

I used the Gradio framework, popular in ML and featured on Hugging Face. The processing takes several milliseconds, despite being server based. The Gradio app offers no frame dropping. I try to include dynamic throttling to avoid frame congestion.

One thing I observed from this app is that the Convnext models have high numbers for the second and third best candidate. The app tries to predict a cheese when there is no cheese present. Two points worth to examine.

9.2 Edge based deployment

The issue with edge-based deployment is python. Python is by default not available on mobile. And because of secure concerns, it is becoming more and more complex to run a full blown linux with a python installation.

The other two ways are mobile apps and browser-based inference. We limit ourselves to browser based, because this is accessible via desktop and mobile.

ONNX format is necessary for web browser model deployment. At the end of this, we will evaluate if inferring onnx gives us the same probability. Due to differences in preprocessing between fast.ai and manual methods, some variations may occur.

Using my knowledge of web development, I constructed a basic app that does inferring the resnet model.

My impressions were that the results were less good. But, the startup and inference time were acceptable and comparable to the python app. This is a point worth to examine once I have defined a proper test set.

10 The End

This was my first basic study in low level ML. I dabbled in pose recognition before and did work manage AI projects. I’m very impressed with the progress these tools have made.

If you have enjoyed the read, come back for the next project. We will revisit a recipe classification app, which I programmed in 2021 and which we will improve with AI.

10.1 Links

Github Repo

Javascript App

Gradio App


© 2025 by Dr. Dominik Lindner
This website was created with Quarto


Impressum

Cookie Preferences