This is becoming less of the mini literature review I originally intended and more of an aggregration of interesting articles (similar to Chemjobber’s thing posted each Friday). While I was hoping to focus more on literature, any tangentially related article still might reveal too much of my work projects. Instead, I’ll do other articles.

The first is Chemical and Engineering News’ article “Is Machine Learning Overhyped?” This article poses the question how useful has machine learning been and where is it expected to go with examples from the fields of drug discovery, materials science and reaction discovery. A chemist at the University of Glasgow suggests that the reason why machine learning has not been the answer to everyone’s problem in pharma as originally hoped is because pharma asks a difficult question with no good answer: will this drug do what I want it to? So it’s good as a guide but it’s really difficult to get a solid answer, unlike materials research where there are many answers to question “How will this material behave?” Reaction discovery’s question of “how can I make this?” is as open-ended as they come but they too have had successes, but as a younger endeavor not as many as pharma or materials.

Machine learning is simply building models, the difference (in my experience) between machine learning and regression is the complexity of the models. Mind you, I’m no data guru or modelling master; this is coming from the practical experiences I have learned from interning at IFF. But when you build a linear regression model or logistic model, you’re able to go to that equation and see what effect each variable has in determining the outcome. Is it a strong signal? Is it a weak signal? How powerful is this variable or does it work better in conjunction with another variable?

A machine learning model such as a random forest model or neural network is too complicated to internalize like a linear regression. You can tell which variables seem most important based on how often they were used for decisions and you can use other metrics such as confusion matrices and ROC curves to evaluate the model, but at the end of the day there are so many decisions being made it’s impractical to go through and understand each tree. It’s the reason why these are often called black boxes. You have to trust the decisions are sound. On the flipside, machine learning doesn’t care about sifting through 100s of variables to find the next best classifier. You can feed it as much data as you conceivably can and it’ll happily plod along. Do that to your local data scientist and they’ll gawk at you. Machine learning is also willing to work off of the tiniest signal, one someone might not have noticed. Then again, a computer doesn’t have the intuition to realize correlation does not mean causation, and that you can’t use the number of Japanese cares sold in the US to predict the number of suicides by car crash. Although, a better question would be why didn’t the modeler know better than to include that as a variable?

In the end, these are all models. They’re tools. Powerful tools, but it’s up to people to think critically. What was in the training/validation database? How distant is the data I’m feeding from that training set? Does this result make sense? What are known limitations of this model? Yes they’re powerful and yes they are revolutionizing they way we interpret data, and we should always strive to use the best tool we have available for our problems. But it is always up to the scientists to think critically and evaluate the results and applicability domain of these models.

Another interesting article is “The Ugly History of Beautiful Things: Perfumes,” a piece on the not-so-clean history for some of our most desirable fragrances. Ambergris from whales, musks from musk deer, and civetone from civets. Something about them is so alluring, their amber and musk notes are fundamental to fragrances today. Fortunately, we don’t rely on animal based anymore and instead have a large variety of molecules to choose from thanks to the work of many chemists and perfumers. However it’s important to look back on where they originally came from and the impact they had on people and how these odors, once reserved only for the height of luxury, can now be found pervading nearly every form of fragrance because of how well we respond and how well they perform. While sometimes crass, the article has many interesting historical anecdotes of how these fragrances were used (and abused) and offers insight from a variety of people in perfumery.

Edit: republished 9/28/18 because the mobile app reverted many changes made