Data is at the center of every HubSpot Hub, yet most organizations still struggle to align information across tools and teams. Different apps store different definitions of the same metric, creating friction between reporting, automation, and strategy.
HubSpot Data Hub helps fix that by giving companies one place to connect, clean, and structure data across their systems. Within Data Hub, Data Studio lets users build Datasets, reusable data layers that define how information is organized and used in reports, workflows, and Segments.
This article explains how Datasets work, how they connect with third-party sources, and why they’re essential for creating consistent, connected data in HubSpot.
What Is a HubSpot Dataset?
In HubSpot, a dataset is a reusable collection of structured data. It allows teams to combine data from multiple HubSpot objects or connected sources into a single, organized view that can be used across the platform.
Rather than duplicating or rebuilding data every time you create a report, segment, or workflow, a dataset acts as a defined data layer that references your CRM records and any connected external sources. It also lets you combine data that can’t be joined elsewhere in HubSpot, such as metrics from multiple objects or external systems, giving you a more complete view of performance across tools.
Datasets are designed to make working with data easier for non-technical users. They eliminate the need to manually export, merge, and clean spreadsheets, offering a centralized, AI-assisted interface for preparing and analyzing your data inside HubSpot.
How HubSpot Datasets Work
When you build a dataset in Data Studio, you combine key data elements from HubSpot and external sources into one unified structure that can be reused across your portal.
Each dataset includes these components:
- Fields: The properties that store information on your CRM records.
- Filters: The rules that determine which records are included in the dataset.
- Calculated Columns: New data points created using formulas or conditional logic.
- Joins: Relationships that connect data from multiple HubSpot objects or external sources based on shared fields.
These elements let you combine HubSpot data with external systems such as Snowflake, Shopify, or QuickBooks, giving you a complete, connected picture of your customers and operations.
Once published, a dataset becomes available throughout HubSpot for use with building reports, workflows, and Segments. For example, you can build a workflow that automatically alerts account managers when high-value customers log multiple support tickets or when a subscription value changes in your subscription management platform.
By standardizing data definitions and calculations, datasets help GTM teams align on the same metrics. Rather than relying on inconsistent spreadsheets or disconnected reports, every team works from the same trusted data source inside HubSpot.
Joining Data Across HubSpot and External Sources
Most organizations store data across multiple platforms, including CRMs and accounting tools, as well as data warehouses like Snowflake and spreadsheets in Google Sheets. Data Studio makes it simple to bring all of this information together.
When you create a Dataset, you can join data from different sources using shared identifiers. For instance, you might join contact records in HubSpot with customer usage data in Snowflake or Shopify purchase history.
To ensure clean and accurate joins:
- Use a reliable join key such as an email address or company domain.
- Keep field names and formats consistent across systems.
- Review how data is mapped between platforms before connecting.
- When joining tables, choose which table’s data to prioritize to control how HubSpot combines matching and non-matching records.
Clean joins allow you to bring your most important data into HubSpot without duplication or data loss, ensuring you have a single, reliable view of each customer.
Which Third-Party Sources Connect to Data Studio
HubSpot’s Data Sync framework allows more than 100 third-party apps to connect directly with Data Studio. This helps teams blend data from marketing, sales, finance, and support tools into a single Dataset.
Common integration categories include:
- CRMs: Salesforce, Microsoft Dynamics 365, Zoho CRM, Pipedrive
- Commerce and Payments: Shopify, Stripe, Square, QuickBooks Online, Xero, BigCommerce
- Data Warehouses and Spreadsheets: Snowflake, BigQuery, Google Sheets, Airtable, Smartsheet
- Marketing and Support: Mailchimp, Marketo, ActiveCampaign, Zendesk, Intercom
- Finance and ERP: NetSuite, Microsoft Business Central, FreshBooks, Zoho Books
Certain integrations, such as Shopify, QuickBooks Online, Stripe, Xero, and Snowflake, are marked as compatible with Data Studio in Data Hub. These enable near-real-time syncing, so your datasets always reflect up-to-date information across your business.
Using AI in Data Studio
Data Studio includes built-in AI functionality to help you create and maintain datasets more easily. These AI tools can:
- Suggest joins or calculated fields based on your existing data
- Generate new “smart columns” using the Data Agent
- Work with Breeze Assistant to help interpret and explore your data conversationally
These tools remove much of the manual effort from data management and help ensure your datasets remain clean and usable across reports, workflows, and segments.
Building a Dataset in Data Studio
Creating a Dataset in HubSpot is a straightforward process designed to help teams blend and clean their data without needing technical expertise.
Here’s how to build a Dataset in HubSpot:
- Open Data Studio under Data Management in your HubSpot portal.
- Select your data sources, including HubSpot objects, spreadsheets, or integrated apps.
- Add the fields, joins, and filters you want to include.
- Create calculated fields or rename columns to make data more readable.
- Format your data and remove null or incomplete values for accuracy.
- Save your Dataset so it can be used across other HubSpot tools.
By turning complex data blending into a guided, low-code experience, HubSpot makes data operations more accessible.
Common Use Cases for HubSpot Datasets
- Revenue Reporting: Combine deal, product, and subscription data to calculate accurate ARR and MRR metrics across customer segments.
- Customer Segmentation: Create Segments based on unified data from multiple systems, allowing marketing and sales teams to target the right customers with the right message.
- Automation Triggers: Build workflows that use Datasets to trigger actions such as upsell outreach, renewal reminders, or customer onboarding.
- Attribution Reporting: Join marketing and deal data to understand which channels drive the most revenue or retention.
- Churn Prevention: Use combined support, product, and engagement data to identify customers who may be at risk and automatically alert account teams.
Each of these use cases highlights how Datasets create consistency and context across multiple business processes, not just analytics.
Best Practices for Managing Datasets
Here are some best practices to keep your Datasets clean, efficient, and reliable:
- Standardize field names and data formats across systems.
- Limit joins to only the data that is relevant to your analysis or workflows.
- Regularly audit your Datasets for accuracy and performance.
- Assign ownership of Dataset management to a RevOps or DataOps role.
- Monitor sync activity to identify issues early.
- Review and update Datasets quarterly to ensure they continue to meet your team’s reporting and automation needs.
By maintaining clear governance and structure, your Datasets remain dependable across every team that uses them.
HubSpot Credits in Data Studio
When using HubSpot Data Studio, it’s essential to know that building and syncing external data connections may require HubSpot credits. Credits are part of HubSpot’s usage-based billing system and are consumed when you import, process, or sync data from third-party applications into your CRM.
Current credit tiers (Data Studio Beta):
- Small source under 500k rows per destination: 25 credits per run
- Medium source 500k to 5m rows per destination: 75 credits per run
- Large source over 5m rows per destination: 200 credits per run
You can monitor your organization’s credit usage in your HubSpot account settings under Billing & Usage. Keeping an eye on this helps you plan how often your data syncs and avoid unnecessary overages.
Understanding how credits work ensures that your team can use Data Studio efficiently without unexpected billing surprises. It’s especially valuable for companies connecting large data warehouses or high-frequency syncs from multiple apps.
Key Takeaways: HubSpot Datasets at a Glance
What is a HubSpot Dataset?
A Dataset in HubSpot is a reusable data layer built in Data Studio, part of Data Hub. It combines information from HubSpot and external systems to create accurate, consistent data for use across the platform.
How it works:
Datasets organize information through joins, filters, and calculated fields. They help standardize metrics and simplify how teams work with shared data.
Where they’re used:
- Reports: Build dashboards that use consistent definitions for key metrics.
- Workflows: Trigger automations using reliable, unified data.
- Segments: Group contacts and companies using insights from multiple systems.
Connected tools:
Data Studio integrates with more than 100 third-party apps, including Snowflake, Shopify, QuickBooks Online, Stripe, and Xero, giving teams broader visibility into customer and revenue data.
Why they matter:
Datasets break down data silos, reduce manual data management, and help teams work confidently with trusted information inside HubSpot.
The Future of Connected Data in HubSpot
Datasets give data-focused teams new ways to organize, combine, and analyze information from across systems. While not every HubSpot user will need them, Datasets are especially valuable for organizations that depend on precise reporting, multi-source data, or advanced automation.
New features like Data Agent and Breeze Assistant make it easier to manage and interact with that data directly inside HubSpot. Breeze Assistant helps users query and interpret their CRM data through natural language, while Data Agent improves data quality by automatically enriching and validating records. Together, these tools help teams get more value from the information they already have.
As HubSpot continues to expand its AI and data capabilities, Datasets will serve as an important bridge between HubSpot’s Smart CRM and the broader data landscape, allowing GTM teams to work smarter with cleaner, connected data.
Need help implementing HubSpot Datasets or connecting your data in Data Studio? Contact the Pros.