Using Data Science To Find Yummy Healthy Foods

Image for post
Image for post

I love food, but unfortunately most foods that are tasty are also unhealthy.

In this article, we will find yummy healthy foods using data sourced from TasteAtlas and MyFitnessPal.

TasteAtlas is an online database that houses thousands of popular dishes around the world. We’ll use this to get the dish and the popularity of the dish.

The other data source we will use, MyFitnessPal, is an online database containing user-input nutrition data for certain foods. We’ll use this to get the nutritional data for our dishes, namely protein, fat, and carbs.

Getting Access to the TasteAtlas Data

First we need to get a dataset containing all the dishes. We’ll use the TasteAtlas API to collect the data.

We first need to get proper headers so we are allowed access. Most websites that do not have a formal API open to the public will deny access. To get around this, we will trick the server into thinking we are using a browser.

You can find the headers for your computer’s browser like so:

  1. Paste this into your browser:

2. Go to the ‘Network’ tab

3. Under Request Headers, look for user-agent and cookie.

Image for post
Image for post

5. Hold onto these values, we will use them in our python code.

Create a Jupyter Notebook

Now we’re ready to create our Jupyter Notebook and get our data

First, import the packages we will need:

Create a dictionary using the two header values we collected in the previous step. The resulting dictionary should look something like this:

Now let’s create a function that will get data for one specific region:

If you get a response that says

{'Message': 'Access Restricted'}

This probably means you need to tweak your headers. Keep trying until you are granted access. Remember, we are essentially trying to outsmart the server, so think like a computer for a minute.

Now let’s create a loop that will get all of the popular dishes for all of the regions on the site:

Turn our list of dictionaries into a Pandas DataFrame:

tasteatlas_df = pd.DataFrame(rows)

The resulting DataFrame will look like this:

Image for post
Image for post

Now we can save our data to a csv by calling:


Collecting Data from MyFitnessPal

Now that we have our TasteAtlas dish data nicely packaged, let’s do the same for MyFitnessPal.

To get data from MFP, we need to conduct a search on the website via python.

Normally, a search on the website would result in the following web page:

Image for post
Image for post

Each row is a user-input nutrition profile for the dish. We want to take the average of all the nutritional profiles.

We can do that using the following two functions.

  1. We need a function that will calculate the average nutritional data for each dish:

2. Now we need a function which makes a request to MFP specifying the name of the dish for which we want nutritional data:

This function results in a dictionary that looks like the following:

{'carbohydrates': 40.87842105263157,
'fat': 6.431578947368421,
'protein': 11.964736842105262,
'calories': 236.71458947368419}

Now it’s time to bring it all together and loop through each dish in our TasteAtlas DataFrame we created earlier:

  1. First create new columns in our TasteAtlas DataFrame which will hold the nutritional data.

2. Now loop through the DataFrame and add the nutrition data to the rows.

The resulting DataFrame will look something like this:

Image for post
Image for post

Save it to a csv and we’re done collecting our data!

Analyze the Data

You can really do whatever you want with this data, but for the sake of this project we will be trying to find the healthiest dishes in terms of the ratio of protein to total calories.

I try to eat as much protein as possible to build muscle after lifting weights, and as little calories so I don’t gain body fat.

My ideal rule of thumb is 10 grams of protein for every 100 calories. This would result in a protein:calorie ratio of 0.10.

But we also want to know if the food is yummy, so we’ll use the popularity metric to find this out.

The analysis is very simple, then:

First sort by the proteit:calorie ratio, then only use the dishes with a protein:calorie ratio above .1, then sort by popularity:

The final DataFrame should look like this:

Image for post
Image for post

We did it!!! Now we have a whole bunch of yummy healthy foods to try!

For fun, let’s look at some photos of these foods

Cevice, from South America

Image for post
Image for post

Carpaccio, from Italy

Image for post
Image for post

Sarma, from Asia

Image for post
Image for post

Asado, from South America

Image for post
Image for post

Châteaubriand, from Europe

Image for post
Image for post


In this project we learned a number of very crucial skills.

First, we learned how to collect data without the need for an API key by using common browser headers and ‘outsmarting’ the server.

Second, we learned how to put our data into a Pandas DataFrame and do simple calculations with our data.

Third, we learned how easy it is to use data to uncover information that is hidden from the general public.

I hope you had fun and learned a lot about food, data science, and uncovering secret information hidden from the general public! Happy coding!

Ian Johnson is a Data Scientist with a passion for learning new insights from data.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store