Collecting Personal Identifiable Information (PII) has been a staple in data collection for a while. Different platforms have different requirements and use the data in different ways. Today I’ll focus on Google’s User Provided Data, which is a big part of their new Data Strength programme, designed to enable and incentivise you to collect data in the most robust and compliant way possible.
Why Collect PII
When a user provides their first party data on your website, in the form of subscribing to a newsletter, submitting a form, booking an appointment, or making a purchase, that data can be collected, formatted, securely hashed and sent to Google platforms. There, they’ll match that data with users that are logged in to their Google profile. This is known as Customer Match, and it allows advertisers to re-engage existing customers, prospect customers that triggered a soft conversion, or find similar audiences across Google’s ecosystem, which includes Search, YouTube, Gmail, and Display, all without relying on third-party cookies. This is of vital importance in our time, when browsers, ad blockers, and regulations are restricting the usefulness of third-party cookies more and more.

Benefits
There are many success stories from implementing Enhanced Conversions (Google’s name for sending UPD to Google Ads). One of Google’s clients in Asos experienced improved return on ad spend (ROAS) with a recorded sales uplift of 8.6% in Search and 31% in YouTube. Improvements like these can be seen on the Conversions diagnosis in Google Ads, where you can start analysing its effect:

How to
There are several ways of collecting UPD from websites. Google offers three distinct ways for data collection when using gtag or GTM.
User-Provided Data variable
In GTM, the variable that formats the data in the correct json format so that Google platforms can parse is the User-Provided Data variable. Knowing how to use it is key, since it being misconfigured will cause UPD collection to not work properly. Additionally, knowing where to use it is also key. All Google tags accept this variable, but for it to be transmitted correctly, you must insert it as the value of the parameter user_data. In fact, when you try to insert any value in that parameter, this variable type is the only one accepted by the UI:

Automatic
This is the simplest and less resource intensive method. It is also the least accurate. When this is active, it allows Google to scan your website for what it thinks its users PII. This means every page of your website is automatically scanned and, other than allowing it to look for just email or also phone name and address, you have no control over what is sent to Google. So if the automatic collection finds an email address in the website footer, it may think that email is the user’s one, and send it to Google. If this happens, it could be very prejudicial.
Manual
The most common way of collecting UPD, since you don’t necessarily need developer work, while being potentially very accurate. With manual collection you can extract the values the user has provided from the website by selecting the element it was provided in. It also needs to be done when the value is present in the page. If the conversion happens in a page where the UPD is not present, then there’s no way of collecting it.
Code
This method adds a little bit more work to the previous one. Here, you still have to correctly extract UPD from your website, but you also have to hash it and configure it in your back-end in a specific manner before sending it to Google. You do this if you want to have complete control over the data, being in charge of the hashing instead of relying on Google, and you don’t want to surface user data in the browser. This is the most secure and accurate way of using UPD, but it requires developer work in addition to the tracking implementation, and for a lot of companies this is very expensive, time consuming, or they lack the know-how.
Common Issues
Unreliable variables
There are several ways of identifying an element in a website. One of the easiest ways is using a CSS selector. It is basically a map of the website, pointing to a specific destination, like an email address field for instance. This is very fragile though, because if any change happens in the website (the page is updated with a new button, a different form or field is added, content is removed or updated, or even a font style is changed!), the destination you were previously using will no longer be the same, and you’ll no longer be collecting that email address. Using this solution requires constantly updating it, which is a terrible use of resources.
Another (better) option is to use an id to identify that element. Just like your id identifies you even if you move houses, an element id will, if unique, continue to identify that element even if the website changes.
There are more methods for achieving this, like form listeners that push the value to the datalayer depending on the form, but the concept is the same: extract the data from the website front-end.
/Insert No/yes meme here:


Conversion location
Another issue that commonly happens is when, even though you’re able to successfully extract the data from the webpage, it is no longer present when the conversion happens. Think of scenarios like filling out a form, but the conversion happens in the Thank You page. For Google Ads there is a straightforward way of making this work in the User Provided Data Event tag, but for other 3rd party platforms, and even for GA, there’s no in-built solutions. Although there are workarounds to solve this (like using back-end dataLayer pushes in the conversion page, or adding the data to the front-end of the Thank You page), it would be amazing if the platforms came up with the solutions instead of relying on our ingenuity.
It should also go without saying that if the user hasn’t provided their data, for instance when they add a product to a cart, there’s no way to collect their data in that instance. We literally need them to write type it down.
Formatting and Hashing
Another technically challenging issue with the Code option is that you have to format the data before hashing it. thisemail@gmail.com, this.email@gmail.com, and THIS.email@gmail.com will result in different hashes, but for Google, they’re the same identifiers. When using the Manual option you can send those email addresses to Google and they’ll automatically format them before hashing and sending them to their platforms. When using the Code option though, you have to do it manually before hashing it.
A note about hashing: doing it in the front-end of the website is not really the best idea. Not only it doesn’t solve the issue of surfacing the user details on the page, but you add unnecessary computational power to the browser.
Consent
I’m not even going to waste your time explaining consent again. You ask for it, and you respect it. It’s easy.
Now, what does it entails in regards to User Provided Data? In the UK, which follows the UK GDPR, even hashed PII is considered personal data, which means that you need to ask for consent to collect it. Google Consent Mode (GMC), will dictate how, and if, UPD is used. In a more technical term, the ad_user_data signal is the signal in charge of allowing Google platform to receive and use UPD. This means that no consent means no Enhanced Conversions. It’s as simple as that.
In GTM this is really easy to set up. Well, all you have to do is to set up GCM. If ad_user_data is set to ‘denied’, and a Google Ads conversion tag fires, it won’t send UPD. If the Google Ads User-provided Data Event tag fires in that scenario, it doesn’t resolve. Easy as pie.
Server-Side
Enhanced Conversions via Server-Side works exactly the way it works client side.
- No consent means no Enhanced Conversions
- Data is hashed before being sent to Google
Implementing it is even easier, since the legwork was done client-side:
1 – For Google Ads, all you have to do is add the Google Ads – User-provided Data tag whenever a request with the user_data parameter is populated. This will send UPD to Google Ads.
2 – For GA, all you have to do is…. Sit back and relax. If a request has the user_data parameter, it’ll be automatically added to the GA event sent from the Server-Side.
The important bit here is that the user_data has parameters to be sent correctly to the server-side using the User-Provided Data variable. If it isn’t, the GA tag automatic mapping of user data won’t work, and you’ll miss out on sending UPD.
The fun bit is though that the server-side allows you to enhance the data you send to the 3rd party platforms. If you have, for instance, user data in your CRM or somewhere else, once the server-side receives a request with partial user data, and it matches your records, you can enhance the request to the 3rd party platform with your previously collected data. This is a massive advantage of server-side tracking.
One last point here: When analysing the user_data object that goes to the server-side container, you’ll notice that the user data is not hashed. This is by design. If you send the object from the client-side container to GA/Google Ads/DV360, you’ll see that it is sent hashed, since those platforms don’t receive that data unhashed. When you see it in your server-side container though, you’ll see it unhashed, since it is yet to go to those platforms. If you analyse the outgoing requests from the Server, then you’ll be able to see the data properly hashed.
Platform Actions
You thought this was over? Well, almost. There are still two places where you have to configure your UPD data collection:
Google Tag
The Google tag is accessible in GA, Google Ads, CM360, GTM and, when Google’s new GTM container rolls out, the GTM container itself. Once you create a Google Tag, it’ll be automatically configured like this:

Allow UPD capabilities is turned on by default, as is Automatic detection. If you have manually configured collection, or you’re using the code option, disable automatic detection.
If you’re using the updated GTM container, you’ll see the bottom part of the image above, and it’ll list all Google tags that are affected by this configuration, which is a pretty welcomed addition.
Remember, the Google Tag dictates what the other platforms can do. If the Automatic collection is turned off here, but turned on in GA, GA will not automatically collect data.
Google Analytics
In GA, there are a few toggles to…. toggle:

Once you activate UPD collection, automatically collected UPD will be turned on by default, so remember to turn it off if you don’t need it. Other than that, you also have to accept the User Data Collection Acknowledgment, then you’re good to go.
Google Ads
In Google Ads, all you have to do is activate Enhanced Conversions (not the one for Leads, that’s a different thing), and select where you’ll be receiving UPD from:

Conclusion
I hope this brought some clarity in why collecting UPD is important, how it works, and how to do it. This is a core part of Google Data Strength and will only increase in importance in the future.
This was also only focused on Google platforms but lots of other platforms, like Meta, LinkedIn, Bing, TikTok, etc, have their ways of collecting and using UPD, and can improve your ad performance if implemented correctly. I’ll write about them in a later post.
If this still sounds like a little bit too much, and you want someone to set it up for you, validate it, and report on its value back to you, give us a shout at Duga and we’ll take it from there.
