Skip to main content

TransUnion Fraudcast Ep 7: Data and Identity Resolution

Episode 7

In this episode of the TransUnion Fraudcast, identity data expert Sean Naismith joins Jason to discuss the difference between “more” data and “meaningful” data, and what organizations can do to make the best use of the data they have.

Jason Lord:
Welcome to the TransUnion Fraudcast, your essential go-to for the absolute linkages between the day's emerging fraud and authentication topics, trends, tropes and travails delivered with all the straight talk and none of the false positives.

I'm your host, Jason Lord, VP of Global Fraud Solutions.

Now, normally this is the part of the intro where I say you don't need to be an expert in data analysis to listen to this podcast, but today we have with us an honest-to-god expert in data analysis, Sean Naismith, SVP of Global Data and Analytics at TransUnion.

Sean has previously worked with Naismith Wealth Management and Enova International, and has a diverse and robust skillset, including financial markets, portfolio management, statistical modeling and much more.

Sean, thank you so much for taking the time to come on the Fraudcast.

Sean Naismith:
It's great to be here, Jason. Thank you for the invite.

Jason Lord:
Sean, it's one thing to have lots of data. It's another thing to have good and useful data.

You can back up the proverbial truck and dump a bunch of data on your front step. But what should we be thinking about when we're thinking about the genesis and the processes around that data that will actually make that data useful?

Sean Naismith:
I always think about data is a representation of something that's happened or is happening in real life.

There's always that data-generating process, and not understanding the data-generating process many times leads to challenges later on in the data science journey, as we build models in decision policies.

So when I think about fraud in particular, because my focus is broader than fraud. It's credit. It's marketing, and far, far more than that is, what does that data-generating process that is generating this data?

And I think with fraud in particular, that's a really important thing to understand.

So as an example here, when I think of something like Device Risk, we're at the type of signals and data that's being produced.

What are these activities?

Jason Lord:
And just for the listeners, Device Risk is a product at TransUnion that determines the riskiness of a device based on that device's reputation.

Sean Naismith:
That's great clarification, Jason.

When I think about how that data is being collected, that's meaningful. It's meaningful to how I interpret the values.

How I interpret how I sample the data and ultimately how I work it into the models in an offline way that I'm developing and ultimately into the production pipelines that's driving this.

So another thing, or a thought around that data-generating process, is what are some of the nuances or what are some of the domain expertise that could come into play?

Machine learning has the capability to extract and discover a lot of great features, and I still don't discount even in today's day and age.

The ability for experts in a domain, whether it's fraud, or even specifically credit card fraud or account opening, right, the ability of a human expert to understand that domain, understand what's actually taking place in that data-generating process to create an engineer, new features that could be highly predictive, that a machine may not have picked up on.

And that's, in my estimation, one of the beauties of how we've structured our teams right at TransUnion.

It's a combination of both human and machine intelligence, but it's really grounded in how do you pair that domain expertise together with the machines to produce the best signals?

And I loved your tagline at the beginning, right? None of the false positives. The ability to minimize that throughout what we do.

Jason Lord:
Now, you brought up a machine learning –– and I don't want to discount the point that you're making, which is the human element is really important to perhaps deriving those nonintuitive insights.

But tell me a little bit more about machine learning and the way that it's changed the game in terms of data collection and data analysis.

Sean Naismith:

So when I think of the quality of data, machine learning could be applied across multiple facets of the data pipeline into the solutions that are in-market that you may put together yourself or that TransUnion may put together, and machine learning from everything from understanding data quality that as it comes in the front door right as it's ingested into systems, to monitoring all the way through.

Of course, predicting whether or not an interaction is fraudulent, you know, machine learning can be applied in all these various areas.

Now, the data that's powering these machine-learning models is a key component obviously. And the ability to have a –– and what I find is a platform that allows you to scale out the ability to effectively manage, govern, monitor the data assets in addition to the ability to manage, govern, monitor the machine-learning models and the analytic models.

And in fact, the entire pipelining is extremely valuable in being not only efficient and effective internally, but continuously having the right type of signal that's being produced out of the systems.

Jason Lord:
So we're a little biased; we think of this data analysis in terms of TransUnion and TransUnion products, but I want to be broad minded for our listeners who may even not currently have TransUnion products to think about what are best practices when you're thinking about bringing the data together in a way that's meaningful.

So what do organizations need to consider when they're analyzing and applying the data that they do have on hand?

Sean Naismith:
Absolutely. Many organizations we work with work across states, work across countries, and we're in a current global situation where the regulations and privacy laws are rapidly evolving.

Now, just because you have data doesn't mean you can use it. Is it applicable, and is it allowed for this use case?

Did we collect it the right way?

Did we have the right permissions in place?

And that foundational data governance capabilities is vital to being able to scale out and move quickly. Because if you do it right, if you're able to have data management platforms that allow for the effective cataloging, curation and transparency into things like data lineage to enable effective data governance, this has what I'd call foundational multiplicative effects on productivity. Because you might know you have some data assets within your walls, but it might take you a long time to figure out if you could actually bring those together.

And then from there, if you could actually use them.

So what I find is really solid data foundations –– I feel like I'm going back 15, 20 years with master data management and some of those hot topics back then…

I think we've kind of come full circle with a with a key focus on those foundations given the rapidly evolving nature of our regulatory environments.

And I think that's a good thing. I think that focus on getting back to the basics.

With what I see in the industry as we adapt to those is setting up for even more quality in the models that are being produced, and how quickly we could move at scale.

Jason Lord:
You touch on a topic that I think is really important, which is that when it comes to analyzing and managing data, the analysts are by no means the only stakeholders involved.

There's also privacy, legal compliance, governance.

Do you find that when you're talking to organizations, they underestimate the complexity of just the human and departmental aspect of data management?

Sean Naismith:
I do see that manifest in one of two ways.

Maybe it's a position of, frankly, ignorance to what needs to be true.

What checks do we need to have in order to move fast?

I've seen this in other organizations –– and on the flip side it is…do we have the processes internally and the technology to support those processes to get it through the pipelines to move fast?

So I see many times getting caught up in either one of those two, and obviously you don't want to be too late and you want to do things right. So very much a good focus on data management and governance, but also technologies to facilitate workflows between teams, is a way to accelerate your ability to build better, faster by getting those foundations in place.

But I think you're spot on, Jason. It's that ability to have the right technology to allow for that collaborative approach.

If you're going to have privacy by design you need to design with privacy, and what's a better way to do that than have those teams involved in the right way, using the right technologies, to do it right from the foundations?

Jason Lord:
And when you're working with these teams, do you find that the sort of digging in deep with the nature of what is required of the data –– is there a translation layer that needs to take place with these groups? And if so, how do you get them on board early on so that there's no misses or surprises later in the process?

Sean Naismith:
You know, everyone's busy, especially today. And, you know, really, really helping align on how do we want to partner on this, what is helpful?

What does it mean to draw out the expertise of subject matter experts?

Because a view from someone that's an expert in privacy might be different than someone that's focused on commercial law, that might be different than information security.

So it's the ability to efficiently work with subject matter experts across the board that are key stakeholders in an effective data management and governance and machine-learning platform is kind of what I'm referring to there.

So I think overcommunicating, spending time together, really getting to know what great looks like for each other. So that's a good foundation for that.

Jason Lord:
Now, we've touched on transparency and we've touched on machine learning.

I know a topic that is very important with financial institutions is transparency and decisioning, because biases that exist within society might be reinforced within a machine-learning process.

And so having a black box system in which decisions are made, and then you're not able to understand how this became discriminatory, is really dangerous to a lot of FIs, but other organizations as well.

Are there any best practices that you would recommend that FIs and others keep in mind, thinking about decisioning when it comes to machine learning?

Sean Naismith:
Yeah, absolutely.

So when I think of making a decision, it's typically if it's an automated decision, it's a combination of input data, scoring models –– that could be machine-learning–driven model or statistical model in rules, and those come together in either a stateless or stateful way in order to make a decision about a transaction, a customer, whatever that decision is.

So when I think about transparency there, there's kind of almost three layers, right?

One layer is transparency into the data that's being used.

Second is transparency into the scoring models, again whether machine-learning or statistical as broad categories, as to how that model created a score…and then transparency into the business rules or the surrounding rules that helped make that decision.

Now, what I find is depending on how you've architected that decision flow, more and more of the what I'd consider to be the key decision-making, goes either into the model itself or into human authored rules.

And I think that's what we see in like orchestration platforms and decision management platforms.

It's that area where you bring together your data with your analytic and your scoring models with the rules to try to find that right balance.

And I think transparency into each one of those levels is important to understand how you're achieving or not achieving the outcomes and what that means.

So monitoring that over time, the methodologies you use can vary right across what you're monitoring in those levels.

And I think a thoughtful approach…again, we talked about partnering with the rest of the stakeholders, and privacy compliance and others; alignment here is critical in my estimation.

There's various ways to do this and being able to stack hands on the right way, implement it in the right way, is what's needed for success.

Jason Lord:
Now as a fraud analyst, you're always searching for anomalies of some sort, something that is different than what's expected in the data.

If you put your analyst hat on for a second, how do you distinguish between data or anomalies that represent something wrong with the process of the technology as opposed to anomalies that represent a change in behavior that you need to take action on or make a decision based on?

Sean Naismith:
Yeah, I love that because while you, let's say you have the transparency or tracking your data, your analytics, how you make decisions and all of a sudden you see a change in distributions.

Hey, maybe that's a new signal. Maybe that's now fraudsters changing their behavior, and we can pick up on something.

Or maybe it's an artifact of the data-generating process and some error somewhere in the pipelines.

Something's changed that isn't indicative, so the ability to quickly adjudicate that is key.

Now that this is where I think about subject matter expertise, and in the domains of fraud that understanding those data-generating processes and being able to quickly see if something's changed there, expertise in the data pipelining and then the statistical methods to say is this predictive of anything.

I think these things come together in a way that help you quickly answer that question, but that's what the team is there to do, is adjudicate that, but I again I love that kind of point.

You're looking for anomalies, you find an anomaly… How do you know this anomaly is something that is an artifact of the system itself or if it's actually a change in the fraudster behavior or consumer behavior?

Jason Lord:
And this is just a Jason question more than anything else: Do you consider that more of an art or a science, in understanding whether it's one or the other?

Sean Naismith:
No, it's a combination. And I think there's definitely a science to putting in place the pipelines and the processes to track this and understand it.

I definitely think there's an art component as well, and that's what separates some companies from others, or users from others.

Jason Lord:
Interesting. So the only thing I feel like we haven't really touched on yet is the role of data consistency and retention.

One question that's often asked in the fraud world is whether data is available to be retroed.

So tell me the role that data consistency and data retention play in the ability to be effective, and fraud prevention, and also find more of the good customers and transactions and let them through with less friction.

Sean Naismith:

Yeah that's a great point. Where you have certain…when you think about the data assets of the data-generating process and the data assets that are produced in order to bring together data to predict fraud, some of this data you might have a robust history on.

You might also have a robust history on what the implications were, what the outcomes were that you're modeling against; other data sets you might not have that luxury.

Question then, is well, how do you bring this together? How do you actually enable a single fraud detection layer in order to leverage both things you can learn from, and things that you have to learn over time and improve.

So again, when I think about the data governance layer, the ability to collect data as an example, and pull data to create a data set if you think something's valuable and store that.

Make sure you have the right retention rights, and data-usage rights as you're doing that, and then the ability to thoughtfully bring together both the signals you have a longer history on ones you're just generating, even things that you could only capture in real time having the right data, analytics and orchestration technology stack to give you the flexibility to learn and apply in a rapid manner, I think is what's needed to be successful there.

Jason Lord:
So one thing I'm always interested whenever I talk to somebody who has, some might say niche, some might say geeky, sort of view of the world as I would say data analysis is, is what gets you excited?

So what is it about data analysis that specifically gets you excited?

Sean Naismith:
You know, I go back to why I got in this industry in the first place, and what always captured my imagination was the ability to take all these disparate, seemingly unrelated phenomenon in the world and to be able to explain the vast majority of what's happening through the same mathematical form…just blows my mind, still blows my mind today.

You know what gets me the most excited is seeing that just grow, exponentially.

So as I think of everything from, the hot topic of today with large language models as your compute and your storage cost get lower and lower as your network transfer speeds get higher and higher like we're seeing now…machines come online that are just mind-blowing with their capabilities, so that gets me excited. It also gets me a little nervous, I think, for a lot of us, the art of the possible, with the unintended effects. But that's me.

It comes back to these simple mathematical primitives, when constructed and applied in this way, what that allows us to do, and when I think of just with TransUnion and what we do day-to-day here, making trust possible, if we can leverage this to make trust possible and global commerce…man, that's a beautiful thing.

Jason Lord:
It's such an ambitious goal, and the idea that it could be underpinned by mathematical formulas –– it's just such an interesting concept, even philosophically.

Sean Naismith:
Yeah, exactly. Totally agree, Jason.

Jason Lord:
Sean it’s such a pleasure talking with you. We know how busy you are and we really appreciate you taking time out to join us in the Fraudcast.

Sean Naismith:
Thanks for having me, Jason.

Jason Lord:

Thank you all for tuning in.

We hope you join us for some upcoming Fraudcast episodes. In the meantime, stay smart and stay safe.

TransUnion Fraudcast

Your essential go-to for all the absolute linkages between the day’s emerging fraud and identity trends, tropes and travails — delivered with straight talk and none of the false positives. Hosted by Jason Lord, VP of Global Fraud Solutions. 

For questions or to suggest an episode topic, please email TruValidate@transunion.com.

The information discussed in this podcast constitutes the opinion of TransUnion, and TransUnion shall have no liability for any actions taken based upon the content of this podcast.

Do you have questions? Our team is ready to help.