Smoothie Scan

August 18, 2024

Background

I received a wonderful gift recently- a magazine full of smoothie recipes! Anyone who knows me knows that I have a smoothie with breakfast basically every morning of my life. I didn’t realize until I received this gift, but I really needed to freshen up my perspective on my recipe! I was stuck in my smoothie ways and this book has opened up some fun culinary doors for my mornings.

I am not great at planning ahead when it comes to buying groceries or ingredients- therefore I wanted a way to easily search the recipe book for specific ingredients to cobble together a smoothie from what I had on hand, or to target specific ingredients to buy on my next trip to the market!

The idea started to percolate…

Idea

Scan each recipe in the magazine
Use Google OCR (optical character recognition) API to scrape the text from each image
Use python regex interpretation to isolate useful information
Create a CRUD searchable database using Ruby Rails and upload each recipe
Search recipes and edit database at my convenience

Step One

I started by scanning each page of the recipe book, and separating out each recipe into it’s own image (which would help with text parsing):

Google OCR API implementation

Next, (with the help of an ever-present Large Language Model programming companion) I wrote some Python code incorporating Google’s Cloud Vision AI to scrape the text from the images and write all the parsed text (using Regex) to a json file.

from google.cloud import vision
import io
import os

# Google API client initialize
client = vision.ImageAnnotatorClient()

def detect_text_from_image(image_path):
    """Detect text in an image file using Google Cloud Vision API."""
    with io.open(image_path, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    response = client.text_detection(image=image)
    texts = response.text_annotations

    return texts

Regex Data Isolation

To capture meaningful parameters to add to my database, I needed to split out the information from each recipe which I found relevant (in this case title, the text/ingredients, and the nutritional info):

    for file_path in file_paths:
        with open(file_path, 'r') as file:
            # read content of each text file
            content = file.read()

            # Find all occurrences of the pattern - anything after "description:" and before "bounding_poly"
            pattern = re.compile(r'description:\s*"(.*?)"\s*bounding_poly', re.DOTALL)
            matches = pattern.findall(content)

           # remove newline chars
            matches2 = matches[0].replace('\\n', ' ')


            patterns = [re.compile(r'^(.*?)\sHANDS-ON'), re.compile(r'(HANDS-ON.*?TOTAL\s\d+\sMIN.)'),
                        re.compile(r'SERVES\s(.*?)\sCALORIES'), re.compile(r'CALORIES\s(.*?)\sFAT'),
                        re.compile(r'FAT\s(.*?)\sPROTEIN'), re.compile(r'PROTEIN\s(.*?)\sCARB'),
                        re.compile(r'CARB\s(.*?)\sFIBER'), re.compile(r'FIBER\s(.*?)\sSUGARS'),
                        re.compile(r'SUGARS\s(.*?)\sSODIUM'), re.compile(r'SODIUM\s(.*?)\sCALC'),
                        re.compile(r'CALC\s(.*?)\sPOTASSIUM'), re.compile(r'POTASSIUM\s(.*?DV)'),
                        re.compile(r'TOTAL\s+\d+\s+MIN\.\s*(.*?)\s*SERVES')]

Ruby Rails Database

Next, I needed a place to host my database. I chose Ruby Rails because it is super easy to implement. I incorporated Bootstrap to format the table, and I used ransack to filter my database to make it searchable. recipe_img

Lastly, I used the active_storage default Gem to create an image upload option, so as I made each smoothie I could include a photo of the end result in the database!

recipe_img

Check out my github for the full code and more details on the project!

Share on

Twitter Facebook LinkedIn

Brendan Inglis

Smoothie Scan

Background

Idea

Step One

Google OCR API implementation

Regex Data Isolation

Ruby Rails Database

Share on

You May Also Enjoy

Vigenere Cipher

Plant Dashboard

Wireless Vinyl

GitHub Repositories