In a previous post, I described a dataset taken from menustat.org. I used the dataset to illustrate how some minor tweaks can get your analyses to run much more quickly.
Anyway, the data are interesting in its own right, so I thought I’d look at some of what’s in it here.
Menustat data
To refresh, the current dataset consists of over 180,000 observations, consisting of food items from 3 years (2014, 2013, and 2012). The variables indicate the restaurant which serves the food item, the category that item falls into (e.g. entree, appetizer, etc), the year, and then nutrition information. In this post, I’m going to make some preliminary plots, focusing on calories plus the macronutrients - carbs, proteins, and fats. For ease of examination, I’m going to plot these as a function of which food category they belong to.
I’ve also taken the liberty of ordering the x axis according to median, descending from left to right. Nothing especially surprising here. Biggest calorie bombs are Burgers, Entrees, and Sandwiches. One Burger tips the scales at around 5000 calories, and while that’s around 2 days worth of the recommended daily energy needs for the average male, I guess it isn’t too surprising. This is the land of the free and home of the brave, after all. Next up is carbs:
This is a bit odd. There’s apparently a side or appetizer which has north of 800 grams of carbohydrates. I find this hard to believe. Let’s look a little more closely.
Ah! My old alma mater! I spent about 7 months employed by Red Robin in the year between undergrad and my master’s program at SFSU. It let me pay off the absurd costs incurred by applying to graduate school in my first attempt. Anyway, this seems to say that there are 838 grams of carbohydrates in 313 calories worth of Guac & Salsa with chips. I don’t know that I believe this. Let’s look at the rows around it (where the same item for 2013 and 2012 should appear)
Well, 2013 seems to be about as expected. 2014, however, seems to be a lost cause. I even did a bit of poking around on the web to see if I could find some better information, but there doesn’t seem to be anything on the first page or two of google. We’ll replace these carb count here with NA and replot.
That’s much better. On to protein:
Okay, a few oddities. A couple of `Toppings & Ingredients’ with quite a bit more protein than one would think. Also, there’s a burger with over 300 grams of protein. I’ll bet it’s the one with 5000 calories.
First of all, behold the Landfill Burger. Yikes. That is the definition of an outlier. Still, I see no reason to remove it or anything. That’s a real thing. A real burger.
Moving on, we see the sides of Blue Cheese and Ranch dressing. Popular among body builders as a quick dose of protein immediately following a workout…
Except not at all. Let’s first try to correct these two observations by looking at the neighboring rows.
Okay, I think we can safely correct that value of 300 grams of protein to a 3. While we’re at it, we can also fix the sodium figure for the same year.
For ranch dressing:
Same problem! Someone switched the numbers somewhere.
Replot:
Okay, that’s much better. Last, let’s look at fat:
One more offender in toppings & ingredients.
Yeah, no way a yellow rice & veg bowl has 320 grams of fat.
Looks like the calories and the fat here are both entered incorrectly. I’m tempted to make both of them the same as what is found on row 122306 (i.e. 320 calories, 5 grams of fat), but row 122306 specifies in the item description that the item is 10 ounces. There’s no such description in the offending row. So, I think I’ll just remove these mis-entered numbers.
replot:
Our variables of interest are now relatively clean, and we can proceed with some more interesting analyses. This will be the subject of a subsequent post.