The University of Huddersfield is working on software that can recognise free text patterns in railway worker hazard reports.
Rail workers are trackside at all hours and in all conditions. Whilst performing their work they can use hand-held devices to send on-the-spot reports when they encounter anything untoward that raises safety concerns. The result is a huge, expanding and potentially very valuable database of these “close calls” and it could make a big contribution to improvements in railway safety.
But a way must be found to crunch the data by developing computer software that can accurately analyse the often specialised and sometimes misspelled or abbreviated language used in the text messages, and that can identify information that flag up safety hazards, enabling remedies and preventative measures.
At the University of Huddersfield’s Institute of Railway Research, Principal Enterprise Fellow Peter Hughes – an established expert on rail safety – is heading research that is making major breakthroughs in the text analysis of this branch of “Big Data”.
The project, commissioned by the IRR’s collaborators, the Rail Safety and Standards Board (RSSB), still has two years to run. But a preliminary version of the software has already been incorporated into Network Rail’s monthly analysis of its ‘Close Call’ records, and Peter Hughes and his team have demonstrated its exceptional accuracy in text searches.
Reports of Close Call safety issues are filed at a rate of some 200 per day on Great Britain’s railways. The database compiled over several years by Network Rail has close to a million entries.
“So we have a got a lot of information about safety hazards,” said Mr Hughes. “But we’re struggling to understand how to use it. How can we untangle all this data so that we can actually make people’s jobs better and safer?
“If we can grab all of the data from the railways and bring it together, we will have better information about how accidents happen, and how to stop accidents occurring.”
Free text reporting
Rail workers report safety concerns and potential hazards be entering free text in their mobile devices. This gives them the descriptive scope they need. But in difficult outdoor conditions and while wearing safety gear, it is inevitable that the reports often contain misspellings and unusual abbreviations.
Mr Hughes and his IRR team have developed their software so that it can comprehend these errors and technical terms, converting them into standard language.
“We can leave the railway workers to write everything they want to write, because they will be teaching us things we don’t know,” said Mr Hughes. Trends will emerge from the huge numbers of records, leading to new levels of understanding of the factors that can lead to hazards and accidents on the rail network.
Software that learns
A key element in the project is that Mr Hughes is not – unlike some internet giants – pursuing the distant goal of enabling computers to understand all text.
“We are not trying to make software that can read anything, but to get software that learns exactly what we need it to learn,” he said.
“We are starting from zero and training the computer as it goes along. We are building up a library of the things we want to know about looking at the patterns we are finding in the database. That’s the approach I am taking that’s different to other main researchers in the area.”
The software under development at the IRR could be adapted to other sectors, and is also demonstrably capable of achieving accuracy in languages other than English. As part of the project, 6,000 records from the Federal Office of Transport in Switzerland were analysed. They were written in German, French and Italian and Mr Hughes’s team achieved exceptionally high accuracy rates in all of them.