Coffee science methodology Episode 4: Empiricism and Critical Rationalism Artwork

Coffee Science for CoffeePreneurs by CoffeeMind

This podcast is our playground for discussing how Coffeepreneurs can leverage scientific methods to lead successful businesses which enriches the lives of everybody involved inside and outside the business.When running a business you have a committed purpose. You need to spend your time where it matters for yourself in order to lubricate your organization to deliver the best products to your audience. If you spend time on something that slows you down or misleads you it is precious time wasted. Unfortunately the global coffee roasting education tradition is a big patchwork with more focus on storytelling than scientific simplicity. In CoffeeMind we live and breathe scientific simplicity and the founder, Morten Münchow, has a masters degree in theory of science and more than 5 years of experience teaching research design and statistics at the University of Copenhagen. CoffeeMind's approach to coffee science and sensory science builds on this solid foundation of theory of science and research design in everything we do and we focus on simple and actionable models for skills improvement in product development and quality control.This podcast for our audience who sets aside the time to hang out with us to understand our scientific approach at a deeper level and who intuitively understands that spending this extra time on understanding methodology is rewarded by you making better decisions which make you a better servant for your audience with less time wasted on things that does not matter neither to you nor your audience. We will take you behind the scene on all of the why's and how's of our scientific projects and business practices so that you can implement our way of thinking in your own organization

All Episodes

Coffee Science for CoffeePreneurs by CoffeeMind

Coffee science methodology Episode 4: Empiricism and Critical Rationalism

February 04, 2022 • Morten • Episode 7

In this episode I zoom in on Empiricism and Critical Rationalism which I consider the most useful foundation for the concept 'objectivity'

Empiricism

Empiricism is generally a movement in theory of science putting a lot of emphasis on experience of the outer world. As they were reluctant to speculate too much about the inner world they wanted the concepts used in theories to rest on the most simple intuitively correct concepts as the basis of a theory. This way they did not have to speculate too much about and establish theories about the inner dimensions of human experience and knowing. This seems aligned with Occham's insistence on simplicity and the minimal amount of assumptions makes a theory superior if it rests on a clear and simple amount of ‘first principles’ for the explanation.

Logical positivism is a branch within empiricism claiming explicitly that

“The meaning of a sentence consist in it’s method of verification”

True objective knowledge needs to be rooted in specific experiences of the outer world and not based on some already existing ideas (as Plato would claim) so here a lot of emphasis was on rooting knowledge in careful explanations of the circumstances under which the experience (or verification) was created so that somebody else easily would be able to re-create the experience independently of the first person experiencing this. Something is objectively true if it is described well enough for anybody doing the same can experience the same. I think this is the most important and useful definition of the word ‘objective’ in science where objective knowledge is grounded in correct use of theory and method rather than whether the science in focus bases itself on collecting data from the inner dimensions of human experience or about the physical outer world. More about that later. Here I just want to mention that it is in the spirit of empiricism that scientific articles always are extremely detailed in the description of the whole technical setup of a research project and how the data was collected. It needs to be described specifically enough for somebody else to repeat the project and end at the same conclusion. Only then it is seen as ‘objectively’ true because no possible personal bias can have been affecting the outcome and anybody doing the same would end with the same conclusion. Here there is an emphasis on the INPUT into the project for anybody to repeat the project if they want.

So, to conclude: “A sentence without a specific method of verification has no meaning”. I can’t help to think about the Roasting Defect curriculum of the old SCAA roasting education system, where I always thought that the circumstances for creating the individual defects were anecdotal rather than specific and therefore I never thought that they were good theories because I would not really know how to setup a test to prove or disprove them. I tried to create a more specific, systematic and simple approach in our first roast profile studies leading to my first scientific publications on coffee roasting and I would say that my simpler and more specific approach is more useful. I created a sample space of 6 different roast profiles as a hypothesis to be tested and I have been happy with the outcome for 5 of them as the ‘Underdeveloped’ did not come out very useful. I think that the area of roasting defects is an area that might be developed further in the coming years where we will have described specific circumstances but also meanings for for example ‘underdeveloped’ that are used in many different ways and I remember a session I did with Nolan Dutton at a Roaster’s Guild Camp where we asked the audience for definitions and ended up with more than 6 different definitions of ‘baked’. I think that specific, simple and “based on root causes” should be the principles with which we search for new theories. And following Ockham a good theory would be grounded in the root causes of the system and be expressed in flame settings, time and temperature observations of physical and chemical founded events rather than speculative elaborate mathematical derived calculations in roast logger software. I love roast logger software and it is highly needed for many reasons but sometimes people care more about discussing shape of curves and fluffy calculations online than the actual coffee coming out of the roaster.

Critical rationalism

Where Plato wanted to distinguish opinion from knowledge Karl Popper wanted to distinguish Science from Pseudoscience and he developed an approach where he focused on Falsification rather than verification such as was the focus of the Empiricists. Not that he disagreed with the empiricists in the importance of experience as the foundation of scientific observation but just like Plato he did not think that observation is enough if you don’t apply the right and critical method.

“A hypothesis is scientific if and only if it has the potential to be refuted by some possible observation”

Where empiricism is very specific about the INPUT of a theory Karl Popper was very focused on the OUTPUT of a theory and also the hypothesis in the theory behind the predicted outcome.

Karl Popper was - like Plato - annoyed that some theories were presented as if it was qualified knowledge without being it. Where Plato had mathematics and dialectics as methodology to refine theories to qualified knowledge, Popper focused on looking at the strictness with which the theory related to the predicted outcome of the theory as he really did not like theories that verified itself again and again because it was not specific enough in the described outcomes to ever be refuted by any possible observation at all. He was particularly critical towards some branches of psychology (Freud and Adler) and some social sciences (Marxism)

It is not enough to prove the theory or ‘verify it’. A good theory is specific enough with input, hypothesis and output so that at least theoretically you can imagine situations where your theory would turn out to be false. Another place a theory can hide without being refuted is if the outcome – even if it exists – is so small that it can’t be perceived by anybody anyway in which case nobody would find an outcome where the theory would be falsified. A simple thought example would be in place to show you how this works. Let’s look at the theory behind gravity: Gravity pulls objects toward the center of the earth (or any other big object). Let’s see if we can verify it: I will hold a stone in my hand with my palm facing down and hold the stone firm enough with my fingers to make sure it does not fall out. If I open my hand the theory of gravity would predict that the stone would fall to the ground. I could test this and I hope everybody agrees that this would indeed happen. So far we have verified the theory. Great. But according to Popper this is not enough. In order to test if it is a good theory you would have to come up with some situation in which case the theory would be disproven if observed. If the stone stays in mid air after I let go with my hand the theory of gravity would be falsified! Which is good for the theory because if it is specific enough when it comes to predicting outcomes so that at least in theory some outcomes can be ruled out by the theory then it is a good theory. A good theory takes chances when it comes to possible outcomes. If it is not narrow and specific in predicting some outcomes and rule out others it does not really say anything about anything as all outcomes would be verifying the theory and no possible outcome would even lead to questioning the theory as no outcome is ruled out by the theory. If a theory never predicts something specific and also never rules out other specific outcomes nobody would ever be able to catch the theory in being wrong!

The focus on different aspects of the shape of the RoR curve I suspect falls within this category. In the SCA Roasting Intermediate and Professional students are asked to discriminate and identify different samples and amongst these are samples with very different development time but exactly the same color. Typically Agtron 75 and respectively 1.5, 3 and 6 minutes development time. These are huge time differences comparred to many of the differences talked about in the general community. But having had close to a thousand through SCA Roasting exams I must just conclude that most students fail to clearly differentiate these otherwise big differences in development time between roasts. Even though most of the attendees on an Intermediate level and almost all the attendees on a Professional course have massive practical experience only extremely rarely does somebody get them all correct. Most would get only some right and surprisingly many really struggle. This observation was the reason that I was not even sure we would get any good data at all when we started the sensory evaluation of roast profile modulations which luckily ended up being really good data which we published in the article with the title “The Effect of Roast Development Time Modulations on the Sensory Profile and Chemical Composition of the Coffee Brew”. I just feared that all my personal observations over the years are more like my good stories than really systematic differences when tasted by a blinded panel tasting the samples in triplicate and randomized order.

My point is that if experienced coffee roasters who attend the SCA Roasting Professional course struggle to taste the difference between 1.5, 3 and 6 minutes I doubt that anybody could tell the difference of a smooth or slightly bumpy RoR. If coffee professionals struggle think about how much consumers would struggle.

The value in a theory is in the output which is the sensory differences and if they are small they don’t have a lot of value and could survive as claims in the community for a long time without even being refuted because nobody dares to discuss with a very elaborate theory even though it is only elaborate on the input parameters and calculations and not the output parameters (which have to be the sensory properties to be relevant for the coffee community)! Because we really struggle for people to take sensory science serious and also to accept how easy it is for everybody – and particularly trainers who makes a living coming up with good explanations – to talk themselves up in a corner and convince themselves and hundreds of students that they have found a really important small aspect that makes a huge difference. I have for 10 years or so told my students that there is a reverse u shape relationship between development time and ‘body’. My claim was that short development times and long development times have a low body but that in between there was a maximum where you would have a higher perceived body of the brew. Having supervised a master's thesis project on the subject has now convinced me that I might not be right because our panel did not find any modulation of ‘body’ while finding a lot of other strong modulations of flavors when doing these experiments in a strictly scientific setup. With Rob Hoos we tested his claims on how ‘body’ is increased with longer time to first crack and also here our panel failed to find anything. And here it is interesting: I have tasted his modulations twice and I felt that I could pick up what he meant, so either I’m biased by being convinced while tasting or we failed to calibrate the panel for exactly what we pick as ‘body’. If we did not calibrate the panel correctly they might not find what we are looking for even if it is there. So I’m stuck between not finding anything but still personally believe that Rob is onto something but so far we have not been able to find it. We did find extended aftertaste and higher bitterness which might capture something in the flavour modulation that Rob and I would call ‘body’. I think this is a good example on how science is not too strict and does not have to be as you can still have a personal hypothesis and if you don’t find it you learn about panel calibration and whether you have done it correctly but you also learn that the difference you are looking for is really small because if it was big the panel would have found it. I think this leaves the theory of increased body for extended time for first crack in a situation where it is not necessarily rejected as we can’t rule out that we in future studies will be able to zoom into what we believe is there. The good aspect of this theory is that it is rooted in the first causes of the roasting process which is time and temperature itself and it is specific when it comes to what it tries to predict: Elevated body for longer time to first crack. So as a theory it is correctly formulated. Next step is to test empirically what it predicts or not. Also notice here how there has been a dynamic activity between hypothesis, testing and refinement of hypothesis so perhaps the last word is not said regarding this subject. But after the testing I think it is fair to say that if there is something it is an extremely small difference difficult to pick up anybody not skilled in tasting and therefore – if it exists – it would be only relevant to few consumer segments.

Talking about small differences and different control parameters it is important to reflect this into basic chemical concepts to understand and evaluate the expected effect of different parameters. Again, talking about the focus on small bumps in the RoR curve I feel is a bit far out from a chemical point of view. If we should look at the situation from a chemical perspective we would ask: What are the temperature differences we are looking at? And it turns out that these bumps might only be in the magnitude of a few degrees such as 2-3 degrees and if small causes has small effects it is difficult to ever prove but even worse: It is difficult to refute because nothing about the evaluation of the theory is clear anyway as a clear sensory consequence would be needed for the theory to predict anything useful and also be specific enough to be refuted. Another problem with this theory is that it connects technical aspects of the roast directly with a sensory preference without being really specific in the sensory descriptive part of it. What is it exactly the theory predicts when it comes to the exact described flavour outcome of different shapes or behavior of the small variations in RoR? You need to describe what happens from a sensory descriptive (not preference) point of view and how it is related to the technical aspects of the roast. After that mapping is done you can later find correlation between doing it one way or another way and find that people don’t agree on which is better. If you just say something very technical about the roasting conditions and what should be done and then jump straight to whether this is good or bad you fail to really describe what is going on before judging one thing bad and another thing good. Mixing technicalities and preference without describing the flavor first makes it a weak theory because you can’t describe why somebody would find it bad. Technicalities of the roast should be linked to descriptive data as the first step. Preference has to be taken completely out when doing this work because that clouds the purpose of the theory. Once you are clear about how technicalities and descriptive flavors are related then you can start experimenting with different types of products and see that there are a lot of very different opinions from different types of coffee drinkers and going straight from technicalities to preferences you would miss out of this complexity and you would get a very one dimensional model to work from in your product development let alone a very one dimensional model for the global coffee community to navigate product development from.

I’m not saying that there is not anything to be found when looking at RoR around first crack as there might some instant events around the crack such as sudden volume increment for the accumulated steam and gas leading to first crack but we would still have to have a pretty specific theory of why such a small event could have a major chemical chemical impact and consequently sensory effect. As mentioned in the beginning it is very rare that a small cause has a big effect! And even if it has, we should first describe how intensities of different flavors are modulated before labeling some technical configurations good or bad from a sensory perspective.

Sticking to first principles such as flame setting, time to first crack and after first crack seems like a well-grounded theory because it is clear and well described how temperature and time has a major effect on a food substance. Formation and breakdown of molecules happens because of thermal energy. Thermal energy is molecular vibration of a material which is initiated by the flame in a roaster in the first place. These vibrating molecules will meet by chance and as they vibrate quicker at higher temperatures more meetings are happening per time unit and at lower temperatures with slower vibration we can just wait for a longer time for the same amount of molecules to meet as if when the temperature was higher at a shorter time. All this happens during good old seconds. In a small experiment we measured the effect of time in itself by creating a series of isothermal (fancy scientific word for ‘keeping the temperature the same’) roast in an IKAWA and we measured a steady increment in color of the beans taking them from Agtron 120 to 65 as the development time increased from 30 seconds to 4 minutes when kept at 210 degrees after first crack. In the beginning of the roast there was approximately a 10 agtron drop per 30 seconds and in the end only around 4 Agtrons per 30 seconds. It seems that time itself has a big enough effect to expect a sensory difference at least when we talk minutes of different development times.

Another type of vagueness of a theory that would fall for Poppers criticism is if the theory is vague because of the nature of the output parameters of the theory. Here the types of descriptors could play a role. Just to explain thoroughly: Descriptors being specific enough to be included in a good theory could be: Starting on solid ground with basic tastes: Level of Acidity or Bitterness or other descriptors with a simple physical/chemical object as a reference such as ‘Hazelnut’, ‘Orange’ or ‘Tomato’ as opposed to ‘Balanced’, ‘Clean’, ‘Structured’ or ‘Complex’ as they are too vague and opinion driven. The only place for such opinion driven descriptors could be if the theory is trying to map out the opinions of a specific consumer segment, and not in a descriptive sensory analysis.

Another theory that would fall for Popper's falsificationism in extension of the mentioned sensory descriptors above would again be the use of Development Time Ratio and any optimum related to this because it fails to predict anything specific: What is it specifically that it tries to predict so that even in theory you can imagine an example where it could potentially be refuted by some specific observation? I have only ever seen ‘Balance’ to be associated with development time ratio and ‘balance’ is so opinion driven that it is to be considered a really unspecific claim that nobody could really ever falsify it. If it is not specific it can’t predict anything and therefore not be falsified and then Popper would call it a bad theory. I have to agree. Let alone that if it can’t predict anything specific it can’t help you to design a preferred outcome for anybody. A theory without a prediction of a specific outcome is difficult to use for anything as it does not predict something specific but Popper’s biggest problem is that since it does not predict anything specific and therefore also don’t rule out any particular outcome it will survive as a good story for a long time because it can’t really be caught bering wrong because no outcome would ever expose the theory of being wrong if it does not exclude any even hypothetical outcomes.