With data from all the partners now received, work has begun on the analysis and creation of predictive models. Dario Greco explains what has been done so far and how collaboration with external groups will help to validate the model’s effectiveness.

We received most of the data from the partners by the end of 2016. There were some delays but we finally got the data in by December, and so we have now started to model the data.

We already know a few things. The data is of good quality, and the problem is modellable. Our algorithms seem to be working well. We estimate that it will take around one week’s worth of analysis on our computer to create reasonable models from the NANOSOLUTIONS dataset.

At the moment we are actually building a number of classifiers i.e. multiple predictive models. One of them gives us an overall idea of what is dangerous and what is not, considering all the data and considering all of the ways in which these materials can cause harm. But then we have gone further, asking specific questions such as: what features of nanomaterials will kill cells? What features of nanomaterials will cause genotoxicity? Which nanomaterials are capable of causing inflammatory diseases in mice? So we are working on a number of classifiers that can tell us different things.

From what we have seen so far, the models are performing very well with our data. Using 10-15 biomarkers, we are able to predict which nanomaterials are dangerous or not with over 90 per cent accuracy. If confirmed, these results will be a significant advancement from the currently available predictive models. Having said that, these are still preliminary models and we are still receiving small changes to the data, meaning we are continuing to make adjustments.

As well as this we are working in collaboration with WP12 to collect a number of datasets from outside NANOSOLUTIONS that can be used for external validation. When you build a model, you test the accuracy of the model on a certain dataset, for example the NANOSOLUTIONS dataset. This is an internal validation. However, no matter how carefully the experimental data has been collected, you can never be sure that this data isn’t showing things that don’t exist in other datasets. In order to avoid this kind of situation, you take your models and apply them to some external datasets.

We are validating our models on independent datasets, such as the one from the EU FP7 project MARINA. They have produced several omics data layers, so we are taking data from there. We have also started collaborating with some scientists from Canada with whom we are exchanging data and models. So in the coming months, possibly beyond the end of the project, we will have a better idea of how valid our models are.