A new CORE white paper shares strategies for matching cross-sector data using the technique known as fuzzy matching. For our latest CORE insights blog post, our data experts share five key practices to help researchers succeed on their fuzzy matching journey.
Cross-sector data can play a valuable role in advancing health and equity. By looking across sectors, researchers can explore the complex relationships between areas like housing, healthcare, education, and criminal justice, and shape cross-sector strategies that improve outcomes for all. Yet, the administrative data required for assessing cross-sector interventions and outcomes is often siloed between those sectors. And as anyone who works with administrative data sets can attest, that data can be dirty, inconsistent, incomplete, and change over time. When working across multiple data sets, as projects focused on understanding and addressing social determinants of health often require, these problems only multiply.
As we explain in a new CORE white paper, fuzzy matching helps address these challenges by making it possible to link disparate data sets and sources, even when the original data are less than perfect. This technique allows researchers to determine whether multiple entries within or across data sets refer to the same individual, and then assign unique identifiers to individuals appearing across rows in those data sets. At CORE, we have found this particularly helpful for cross-sector data projects related to the social determinants of health. With these unique identifiers, you can start to answer questions like:
- Which door do most folks walk through to access support?
- How does getting support in one system translate to utilization in other systems?
- How can programs strategically align their efforts to best support the clients that they share?
Fuzzy matching allows you to explore these questions and much more. Read on for five key practices for fuzzy matching success. And for more insights, click here to download the full white paper: Getting Clear About Fuzzy Matching.
1. Be super clear about the goals of your project
The first thing you need to consider is how the data will be used. Knowing how you’ll use the values output based on your fuzzy matching project will help you better understand what types of fuzzy matching strategies you might need to leverage.
2. Be patient and take your time
Establishing a new ID field that works to identify unique people within or across different datasets is a complicated task. With that in mind, the payoff of having confidence in a newly created field is worth your time!
3. Profile your data thoroughly
Knowing your data sources’ idiosyncrasies and developing strategies to address them before you start drafting your fuzzy matching solution will save you time later in the process. For example, some systems have routinely entered values that are meant to be dummy placeholder data. In that case you’ll want to make sure they are changed to NULL values, so they don’t create false matches.
4. Build your matching criteria with a few complicated use cases
Finding records associated with a few people across your data sources and working with just those rows as you embark on your fuzzy matching project will make your data more accessible to start. Finding rows for complicated cases where there are differences across sources is ideal and can give you more confidence in your model. By leveraging algorithms that match complicated cases, it is likely that you will match the easy ones with the same algorithms too.
5. Manually review your results
It can be tedious to examine groups of rows that either did or did not match based on whether you think they should have. However, when you skip this step, you may lose opportunities to improve your fuzzy matching strategies as well as confidence in your results.
With these five steps, you’re that much closer to fuzzy matching success. Are you ready to dive into fuzzy matching or interested in learning more? Get in touch to discuss how CORE can help, and be sure to download the free white paper for more ideas and insights.