If we haven’t set up our product catalog consistently, can we still use Google Vertex AI to power product recommendations?
Yes—you can use Google Vertex AI to power product recommendations even if your product catalog isn’t consistently structured yet. But you’ll want to be realistic about what that means in practice, because inconsistent product data will constrain both the model’s effectiveness and your ability to operationalize recommendations. Let’s walk through this pragmatically.
Why it’s technically possible
Vertex AI is flexible and built to handle a wide variety of data scenarios:
-
You can ingest semi-structured and unstructured product records—like partial schemas, missing fields, or inconsistent category labels.
-
You can design preprocessing pipelines (using Dataflow, Dataprep, or custom notebooks) to normalize, enrich, and clean the data before training.
-
You can even train models that rely more heavily on user behavioral signals (clicks, purchases, views) rather than relying exclusively on perfect product attributes.
In other words, there’s no hard dependency on having a pristine, normalized catalog—Vertex AI’s tooling is designed precisely to help teams work incrementally toward clean data.
The trade-offs to expect
That said, there are some predictable costs and risks you’ll want to factor in:
-
Feature Quality Limits: If your catalog has inconsistent taxonomy (e.g., product types, categories, or descriptions), the model loses valuable semantic signals that improve recommendation relevance. You may end up relying heavily on behavioral co-occurrence (customers who bought X also bought Y) rather than content-based signals.
-
Cold Start Challenges: For new or low-traffic products, you typically rely on metadata to power recommendations. Inconsistent attributes make this much harder—new products won’t have strong similarity signals and will perform worse.
-
Operational Maintenance: Every inconsistency in the catalog requires extra data engineering: mapping fields, standardizing formats, resolving duplicates. These processes often balloon in complexity if you skip foundational cleanup.
-
Explainability Issues: If the product data is messy, it can be harder to trace why certain recommendations were generated. That can hurt stakeholder trust and create friction with merchandising teams.
How to approach this pragmatically
If you’re committed to moving forward before your catalog is cleaned up, here’s how to improve your odds of success:
-
Segment your data. Identify which parts of the catalog are in better shape—perhaps certain product lines or categories have consistent attributes. Start modeling there to demonstrate value before tackling harder areas.
-
Invest in preprocessing pipelines. Use Dataflow or BigQuery pipelines to normalize categories, deduplicate SKUs, and flag missing fields. Even simple cleaning—like unifying units or standardizing labels—will materially improve model quality.
-
Leverage behavioral signals. Where catalog metadata is unreliable, lean into collaborative filtering approaches that exploit customer behavior logs. This can offset some of the gaps in product content.
-
Document known limitations. Make sure stakeholders understand that recommendations may have lower precision for certain products or categories, and that improving catalog consistency is a parallel priority.
-
Iterate and improve. Treat early deployments as proof-of-concept, with a clear plan to mature your product data foundations over time.
Bottom line
You can use Vertex AI effectively even if your catalog isn’t consistent—plenty of companies start this way. But you’ll be trading off precision, explainability, and scalability.
If you frame your recommendation system as an iterative capability—starting with partial data and steadily improving your catalog and pipelines—you can capture early wins without locking yourself into a brittle foundation.
Put simply: yes, you can do it—but treat data cleanup as a critical parallel workstream, not an optional afterthought.