By Frank Wang, PhD
In April 2017, the Office of the National Coordinator (ONC) announced their Patient Matching Algorithm Challenge. Although Vynca did not have a generally available standalone patient matching product, a matching system was developed and deployed within our Advance Care Planning solution. Our production matching product was managing record-linkage for over 2.5M unique patient identities. When we learned of the challenge, we debated whether or not to enter. We were achieving ‘pretty good’ results with our clients, and this would provide the external validation we needed. This competition would pit us against the best of the best, and the results would show how we would perform against the rest of the industry.
Once we received the data, we decided the best approach would be to tweak our existing machine learning algorithm. Our system analyzed the unique properties of the input data, and from there, selected the best algorithm or combination of algorithms for that input data. Basically, we get the benefits of each algorithm without the downside. And since this is a machine learning algorithm, our system performance benefits from increasing the number of labeled datasets used in training.
Vynca’s patient matching solution achieved the highest performing score of more than 140 teams from both industry and academia, with an F-score of 0.975, optimizing for both precision and recall. Out of the top three teams, Vynca was the only team to utilize machine learning, and out of the 60,000 duplicate pairs identified, only 300 were labeled as potential matches requiring human review.
The unique characteristics to that aided in our victory:
– Machine learning model. Vynca’s matching algorithm is not based on fixed rules; instead it is a stacked model trained by machine learning that analyzes the profile of the input demographics to determine the overall weight of each individual matching algorithm to calculate an overall probabilistic confidence level of a match.
– Minimal manual review required. Once a model has been trained, the output of the model is a probabilistic confidence level of a match that typically has a bimodal histogram distribution. Instead of selecting a threshold to determine if the match is a true positive, the user would select the desired precision and recall rates from an ROC plot. Thus, manual review of potential matches is minimized.
– Continual improvement with additional datasets. The addition of a new client into our system will improve the match rate for all customers. Our system learns the properties of the data, and that learning is applicable to all customers. The system will ensure that each client’s selected precision and recall rate is met or exceeded.
Although we ultimately won the competition, we did face some unexpected challenges. Since the data from the ONC was synthetically generated, we couldn’t understand why some pairs were labeled as matches, and others weren’t. This does not happen with real patient data, because we have access to additional information to determine a match. Initially, our existing algorithm didn’t perform well with this synthetic data, as our algorithm was designed and trained with real-world patient data.
Future Plans for Our Patient Matching Solution
When we looked at our results compared to the second and third place teams, we realized that each team had found a substantial number of record linkage that was unique to each team. That indicated that there was still room for improvement for recall. Currently, we are looking to implement the algorithms used by the second and third place teams, so our system has additional algorithms to choose from when analyzing the properties of the input demographics, thereby increasing our recall rate.
Since our win, we have received strong interest in our patient matching, and are currently exploring ways to commercialize our solution that address the needs of the industry.
To learn more about Vynca’s patient matching solution, or for a copy of the white paper, please contact email@example.com.
Frank Wang, PhD is the co-founder and Chief Technology Officer of Vynca.