Gradiant colleagues Nora M. Villanueva and Marta Sestelo’s work was awarded in the third edition of this international conference
The European R Users Meeting (eRum) is the largest European meeting of free software and programming language professional users R
A few months ago we told you about Budapest hosted the third edition of the European R Users Meeting (eRum 2018), an international conference aimed at knowledge transfer and the meeting between professional users of the free software community and programming language R, focused on statistical analysis. Our colleagues Nora M. Villanueva and Marta Sestelo went there to learn about the latest developments in this matter and to learn about them, as well as to present a joint project that has finally been recognised by the event’s organisers.
The project, called ‘Finding groups in time-to-event data by means of the clustcurv package’ and which has been developed and implemented in an R package, allows to group survival curves by clustering techniques automatically. The objective is to configure a useful tool for taking decisions in organizations that work with a huge volume of data, such as in Industry 4.0.
“The idea behind this work was to define a new algorithm that would allow us to cluster similar curves in order to make decisions based on these clustering”, said Nora M. Villanueva, researcher in Services and Applications department at Gradiant, who recognizes with an example that “every organization works with a huge amount of curves of different elements, such as the operation of a machine, customers who leave a particular service or work life of pieces that are produced in a factory. Our algorithm allows us to group these curves by resemblance, showing which elements perform in a similar way.
An innovative project in clustering techniques
Nowadays, clustering techniques allow grouping curves according to the number of groups that have been defined. “The innovation of our algorithm is that, in addition to making this cluster, we can know -with statistical significance- how many different groups there are”, said Marta Sestelo, researcher in the INetS department at Gradiant. In fact, this is the most important and differentiating characteristic of their tool. Until now, groups were chosen according to the subjective and non-automatic criteria of each researcher. In addition, this methodology is implemented in a R library, an open source programming language available to everyone who needs it, such as the scientific community or other organizations.
A transversal management tool
Results of the project have a direct application in different sectors where it is necessary to estimate the probability of an event occurring in a specific period of time. Banking, insurance or any company operating within the Industry 4.0 sector can also benefit from this project, as it could group time curves up to the event, being this a failure of a piece, customer delays or crop mortality in a fish farm, for example.
In addition, this project also has a significant place in other areas such as medicine and education. “We can apply it in our daily work with the different technologies we are experts at Gradiant, such as eLearning projects applied to classrooms where we want to investigate dropout of students in a particular course,” said Nora M. Villanueva.
As a result of the project’s versatility, other international institutions have also become interested, such as the prestigious ‘Statistics in Medicine’ journal specialized in statistics and probability. At the moment, the work has already received the recognition of the entire eRum 2018 team, an international event that was attended by more than 500 professionals from 19 different countries last May to follow the conferences and presentations of more than thirty speakers from different Universities and internationally renowned companies such as Rstudio, Microsoft, H2o.ai or Mango Solutions.