Insights from industry

Avoiding the Everyday Pitfalls of Chemometrics

An interview with Dr. Barry M. Wise, President of Eigenvector Research, discussing the importance of producing a robust chemometric model and how chemometrics can add value to spectroscopic datasets. Interview conducted at Pittcon 2019.

What is chemometrics?

Chemometrics is the chemical discipline that uses mathematical and statistical methods to do two jobs. The first is to develop measurement procedures and experiments so experimentalists can get the maximum amount of information about the system they are investigating with the least amount of cost and effort. This is the experimental design part of chemometrics.

The second job involves developing mathematical/statistical models that can be used to relate measurements made on a chemical system to a state or property of the system. This is typically a chemical concentration or other property of a system, like the tensile strength of a polymer or the octane number of gasoline.

This part is all about relating the measurements that we make, which are typically spectroscopy for the Pittcon crowd, with what's going on chemically.

Why is it important to build an optimal model in chemometrics?

I like that question because, for one, I don't know if I believe in optimal models. Any time I hear somebody talking about optimality I’m inclined to ask, ‘With respect to what?’ In fact, one of the biggest problems in chemometrics is that people believe they need optimal models, and they are usually thinking about prediction error. They don't need optimal models, but rather, adequate models that have been designed for the long term.

One of the mistakes that people tend to make is trying to squeeze every last bit of performance out of their models. They are inclined to overfit their data, and then often find that their models don't perform very well in the long term or over expanded datasets.

Instead, we want to encourage people to develop models that are going to be robust and reliable. They might not be optimal at any one point, but they will last longer.

A model doesn't have to be better than what your spec is. For instance, if you're running a refinery and you're controlling the octane number of gasoline, the process control engineer will tell you how accurate the octane estimate needs to be in order to meet his control objectives. Your goal is to make a model that meets the control objective, and try to make that as maintenance free as possible.

Which scientific fields have incorporated chemometric tools?

Many of the methods we now think of as chemometric methods were originally put to work in the social sciences. They were adopted first by scientists, including spectroscopists, working in the lab, and then with engineers for their online applications.

It turned out that near-infrared spectroscopy was the perfect application for chemometrics; it's a technique that produces very high quality data with good signal-noise but rather complex relationships between the measurements and chemistry.

Initially, these scientists had to create models from a theoretical perspective, which was difficult, sometimes intractable due to the complexity of the systems they were working with. However, calibration techniques like partial least squares (PLS) worked well with that kind of data. That was really where chemometrics got its real foothold in analytical chemistry. Since then, it's spread into many, many more analytical fields and into engineering as well.

Chemometric models were adopted first by scientists, including spectroscopists, working in the lab, and then with engineers for their online applications.radiorio | Shutterstock

What are the main challenges for scientists working with chemometric tools?

The biggest challenge is to understand what the methods are doing and what the results actually mean. The good news is that you can teach people this in a relatively short amount of time with a few days of classes and reading. This likely is a lot less time than it took to learn all the domain-specific knowledge that those scientists bring to work every day.

If you already know generally what's going within the chemistry and physics of your own data, chemometric tools can help you get even more out of your data.

At Pittcon 2019, you gave a talk on Common Chemometric Mistakes and How to Avoid Them. Why did you choose to present on this topic?

I wanted something that was fun. Everybody likes to hear about it when other people mess up! But besides being entertaining, it is a real issue. There are a lot of misconceptions and bad habits that people have that could be fixed for more successful analysis.

What advice would you give to scientists working with chemometric tools?

One is to make sure that you understand what you're doing. That will probably require some study and/or taking a class. Beyond that, the single biggest piece of advice would be to validate your models on independent data sets. You’ll stand a lot less chance of going wrong if you do this.

How is Eigenvector working to make chemometric tools more accessible and analysis easier to carry out?

We've worked to make the tools more accessible to people by moving to much more user friendly graphical data editing. We’ve worked hard to make it easier to add additional information to plots in order to really bring everything together and allow scientists to get an overall picture of their data.  

Another aspect that we've been improving is integration, where we've been taking modeling methods that aren't necessarily our own and incorporating them into our software. Putting them in the same framework enables direct comparisons to be made.

This is important because different developers have different ways of testing their models and different ways of pre-processing the data that might come into it. However, you can only really compare data if you put it on the same footing.

We've worked to put everything in the same framework, so when you want to compare a PLS model to a Neural Network model, to a Support Vector model, to a Locally Weighted Regression or a Regression Tree model, they're all in the same framework and you know you’re actually comparing apples with apples.

What’s next for the company?

We want to expand our tool set and continue to refine it. We do more and more online applications as we go. From our perspective, we want to reach more instrument makers and be the chemometrics end of their hardware platform. We think that's a smart move for them, too, because anyone can write their own chemometrics package but it won’t be as refined and extensive as the one we’ve worked on for over 25 years now.

So we want to continue to expand on our collaborations. We're also trying to take our classes to more places worldwide so that more scientists and engineers can learn about chemometric methods and how to apply them well.

Where can readers find more information?

About Dr. Barry M. Wise

Photo of Dr. Barry M. Wise Dr. Barry M. Wise is President and co-founder of Eigenvector Research. He received his doctorate in Chemical Engineering at the University of Washington where he studied under professors N. Lawrence Ricker and Bruce R. Kowalski.

Wise is the creator of PLS_Toolbox, a comprehensive chemometrics software package with several thousand users worldwide. He has presented over 100 chemometrics short courses and has authored more than 50 peer reviewed articles, book chapters and patents.

About Eigenvector Research Inc.

Eigenvector Research, Inc. (EVRI) is a full-service Chemometrics company, offering software, training and consulting. EVRI provides advanced chemometrics support for a wide variety of industries and academia. Our chemometric software products include our flagship MATLAB-based PLS_Toolbox and stand-alone Solo.

Disclaimer: The views expressed here are those of the interviewee and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this article?

Leave your feedback
Submit