In this interview, Fabian Feidl, Co-Founder of DataHow talks to AZoM about the several applications of Raman technology such as different upstream processes, to first downstream applications, to monitoring classical process parameters as well as quality related attributes.
Could you please tell us about your company Datahow?
We founded the company in 2017 as a spin-off from the group of Professor Morbidelli, who is well known for his activities in continuous biomanufacturing. Our main goal is to support customers from manufacturing industries, with a focus on biopharma and chemistry, to enable process digitalization and optimization following the goals of Industry 4.0.
Our key competence along the versatile digitalization activities is the support of decision taking in process development and production through advanced customized data analytics and process models. In the other fields such as data management, sensors, and lab automation, we are partnering with other companies, like Siemens or Kaiser Optical Systems.
Our key rationale is a hybrid approach that uses the expertise and intuition of engineers combined with the computational power and non-intuitive learning ability of computers power to solve problems. The challenge, especially in biopharma, is that we are working with very complex, not completely understood processes with many tens to hundred process variables to be considered along with several tens of quality attributes across the process development, which is subject to very high experimental costs and increased time-to-market pressure. Hence, in our opinion, the hybrid approach of using data and process knowledge driven algorithms are crucial to efficiently provide trustworthy decision support.
What are the kinds of services DataHow provides, and how are they used?
The products and services we offer can be divided into three pillars. Our main product is the Smart Software for process development and production (DataHow® Lab), which includes multivariate data analysis, machine learning tools, with the possibility to integrate process knowledge for decision making, monitoring, optimization, and control. Our second pillar are Smart Sensors, for which we are closely collaborating with Kaiser Optical Systems to combine our bioprocess and data analytics competence with their expertise in spectroscopy products.
Together, we are developing advanced spectral data analytics toolboxes for USP and DSP biopharma processes. We offer model development and implementation service and also created special solutions enabling diversification of the utilization of our tools such as our patented FlowCell for the application in downstream processing.
While the first two pillars are being commercialized in 2019, the third ambitious pillar is the integration of our solutions into a smart platform, to enable machine learning-guided, robotic high throughput process development, and eventually automated and self-optimizing biomanufacturing. Here we are closely collaborating with the group of Prof. Neubauer from TU Berlin.
Could you give us a short explanation of Raman spectroscopy, and why DataHow is so interested in it?
Raman spectroscopy is a light scattering technique which uses a laser at a certain wavelength interacting with the sample. This leads to a wavelength shift of the excitation light captured by the detector, resulting in a Raman spectrum, which is highly specific for each chemical species.
If you have a simple molecule, in a very pure sample, we can see distinct peaks building a fingerprint. The intensity correlates with the concentration of the chemical compound. However, if you have a macromolecule, like an antibody, with several impurities, this leads to overlapping fingerprints with no distinct peaks, which makes the interpretation by eye impossible.
This effect is further increased if you measure in a bioreactor, where thousands of components are present and detected at the same time. Therefore, powerful data analytic tools are needed to find the relevant information explaining the offline analyzed concentrations of the reference measurements.
Once those models are calibrated, they can be used online to predict for example the current glucose concentration in the bioreactor using newly incoming spectra. One of DataHow's key competence lies in the building of those advanced predictive models.
How is data analytics typically carried out, and what makes your approach different?
The classical model approach consists of three steps. The first is a data preparation step, in which the reference measurements need to be collected from different offline analyzers. Spectra needs to be picked, aligned with the reference measurements and read into the analysis software.
In the second part, the spectra pre-processing or pre-treatment step, the main goal is to decrease the significant content of noise in the spectra and enhance the relevant signal. For this, there are many combinations of different techniques, but they need to be carefully selected and tuned since they have a huge impact on the model development, which is the next step.
In this step, the data set is split so to first train a model and then apply and evaluate it on a test set.
Usually in classical, commercial tool-assisted modeling approaches, a lot of manually handling steps are needed, especially in the pre-treatment part, while the actual model development part is usually limited in its flexibility and customization potential by the design of the commercial tool.
In DataHow's modeling approach, we are aiming to immediately start exploiting the Raman spectroscopy technology and monitoring the first performed run by using a technique called transfer learning, supported by historical datasets. Once we gained data of the new process, we customize and retrain those models, enabling through our advanced modeling to significantly decrease the model error and especially increase predictive robustness.
Once we gained even more process data, we can further enhance the predictive ability and scope of Raman spectroscopy enabled perspectives through the toolset of our smart software as well as the generated process understanding. The aim of our spectral modeling toolbox is to provide to end-users a powerful and user-friendly tool enabling them to as broadly and efficiently as possible exploit the potential of Raman technology in USP and DSP applications.
Are advanced techniques really needed?
We generally believe that in order to create automated, robust, generic as well as self-learning solutions advanced tools have to be combined with standard techniques. Thereby, while in some cases advanced non-linear techniques get similar results like standard modeling techniques, in some cases it is vital to use advanced techniques.
Additionally, it is very important to compare models not only based on their root mean squared error, but especially consider their robustness with respect to transferability across process variants and scales.
We trained our own models on a very large dataset for several process parameters and compared it to commercial modeling tools. In this case, the advanced techniques always outperformed the commercial standard tool. This is mainly due to differences in their characteristics. While some are able to predict non-linearities, some are able to extrapolate.
Some are more robust towards outliers or independent of the pre-treatment settings. However, it shouldn’t be a decision between advanced vs. non-advanced technique. Important is to select the technique, which is optimal for the current case.
With so many models, how do you know which one in particular to use?
Again here we are providing an automated framework to elaborate on the selection of thousands of different possibilities of assumptions and model parameters combinations to eventually define the most promising combination of models. Thereby, we never use a single but always a fusion of several models to benefit synergically from the different advantages of different modeling strategies.
As a simple example, imagine you have two models. Model one, and model two, and each of these models predict slightly different. According to the root mean square error, model two is better than model one.
But using, now, a machine learning based model fusion approach, which understands which model is better in certain cases, we can decrease the root mean squared error from 0.74 or 0.63 to 0.53.
Moreover, a very interesting and important approach is also model localization, which is basically building some models on a subset of data. The selection of such local regions are again automated and localization can be used to further improve predictive accuracy and robustness.
What happens if you start a new process, and don't want to wait until you have enough spectra generated to build a robust model?
We are currently developing a concept called transfer learning to efficiently start utilizing the Raman technology right after installation. Over time, you might collect a huge number of different Raman data from other processes. This data can then be used to train our advanced models, and being applied to a completely new process, meaning that you can already monitor the first run of your new process. Once you’ve generated new data from this process, you take that new data and retrain your advanced model.
We could show for several uses cases that this concept already works. We used an old dataset (for training) and a new dataset (for prediction). When we used only the new data set and sequentially added more data from this set (standard approach when we start a new process), we received very high model errors and very big standard deviations of the errors. If we used only the old data set to train the model and immediately applied it to the new data set (transferred models), we clearly outperformed the models.
Once we retrained those models with only a part of the new data (retrained transferred models), we could significantly further decrease the model error and increase the robustness. This shows us the huge potential of the transfer learning concept. Additionally, this also motivates to exploit possibilities to integrate all possible data sources – ideally from different companies, e.g. in cloud solutions, for joint efficient and effective utilization of the Raman technology.
Could you tell us about how Raman spectroscopy has been implemented into your products?
Following the first success stories in USP, we were very interested in extending the exploitation of the Raman technology towards DSP, where today we face about 50% of the cost in biomanufacturing.
Having an online measurement of the target protein or even some quality attributes would be a huge advantage for downstream processing, and therefore we developed a patented FlowCell.
We designed the flow path in a way to reduce peak broadening effects, which makes it also suitable for chromatography. Although we only have a very limited volume of 140 µl, we can also increase the signal by using a reflector, and through the non-contact design, the optics components are separated from the volume streams.
This leads to a high-pressure tolerance but also enables to think into the direction of single-use capabilities. You can connect the cell to either a chromatographic system, or other process units, and use it in online mode. But of course, you can also use it in an offline mode.
Image Credits: shutterstock.com/vs148
How are you developing this technology, and where do you see it being implemented in the future?
We believe that in biopharma Raman technology will have a broad potential in the future with many more and more advanced applications compared to the current state.
With our advanced spectra modeling toolbox, we want to enable to unveil process and product properties, which could not yet reliably quantified. Building powerful and reliable models already during process development shall then strengthen the scope of applications during later tech transfer to manufacturing. Here based on our Smart Software and Smart Sensor toolboxes we want to enable to build Raman information based real-time process optimization and control.
Of course, through the increased utilization of our flow cell across different industrial use cases we want to fully exploit the potential of Raman applications in DSP. As we have detailed process understanding, the corresponding toolbox will be assisted by the option of also fusion with deterministic model variants.
Besides the target for a broader scope of applications and improved accuracy and efficiency of the model-enabled use cases, our key target is to provide a user-friendly solution to our customers, who shall benefit from the versatile toolset, while singly having to provide their data to the solution.
Are there any other areas where you have used Raman spectroscopy?
We implemented the Raman technology in upstream, but also in downstream applications. With Siemens, we developed a supervisory control and data acquisition system for an end-to-end integrated continuous bioprocess, collecting all data and centralize the data in one database. Within the platform, we developed an advanced process-wide monitoring and even predictive maintenance tools.
Additionally, we developed a supervisory control which adapts, for example, the first chromatographic step, based on the titer concentration in the perfusion bioreactor. Moreover, we want to implement this technology within small scale robotic platforms, which are generally used in process development. The additional process information could be coupled to advanced process modeling approaches and could further accelerate the process development.
How easy is the implementation of the FlowCell?
It's quite simple actually. The only thing that you need to do is set up the different layers, and the parts.
Then, you need to adjust the non-contact objective to maximize the signal intensity. On the other side, you need to adjust the reflector, and once this is set up, and the signal intensity is increased, you can start your measurements. You can clean it by using solutions which are common in chromatography, and the system is ready for use.
When, and how, will we be able to order FlowCells from Kaiser?
It is possible to already order FlowCells. Currently, we're providing them to a limited number of customers and getting the feedback to commercialize a new generation of improved FlowCells.
How much data do we really need to train a model?
This is very difficult to answer. If you take the classical process, you usually need three to four different bioreactor runs measured around once per day. This however strongly depends on the variability of the process. But when you have an old, reasonably comparable historical dataset, and transfer the model, you can already start monitoring during the first run.
Have you been publishing alongside the development of this software?
So far, our focus is not on publishing. We are currently focusing on developing and commercialization of our software. Alongside with customers and partners interested to make their success stories more visible, we are indeed interested to also publish some of the results in the future.
Does the training of these models now require the use of supercomputers?
It is true that those advanced modeling techniques need more computational time, but we are performing our calculations on normal personal computers. Of course, here, the better your computer, the faster you'll receive the result. However, it is possible to perform this on a personal computer, which takes several hours, but the retraining, which needs to be done more often, is in seconds or minutes.
Where can our readers go to find out more?
To find out more please visit;
About Fabian Feidl
Fabian Feidl started his studies in pharmaceutical biotechnology at the University of Applied Sciences Biberach. In the course of his B.Sc. studies, he carried out a practical semester and Bachelor’s thesis at Rentschler Biotechnology.
While his bachelor program was very wide-ranged and orientated on requirements of the biopharmaceutical industry, he focused on biomolecules, their engineering and manufacturing in his Master studies of Molecular Biotechnology at the Technical University of Munich.
After several research internships, he started a research project at the University College London. Subsequently, he began his PhD in the Morbidelli-Group at the ETH Zurich, in which he co-founded the spin-off DataHow AG. Fabian Feidl got scholarships from Roche, Hans-Rudolf foundation, Karl-Schlecht foundation and was elected to join the Bayerische EliteAkademie.
Disclaimer: The views expressed here are those of the interviewee and do not necessarily represent the views of AZoM.com Limited (T/A) AZoNetwork, the owner and operator of this website. This disclaimer forms part of the Terms and Conditions of use of this website.