Not everything needs to be a data product. Some data lives in a pipeline and that's enough. But when something matters — when other teams depend on it, when it feeds a decision, when someone will notice if it breaks — a label isn't sufficient. That's when a warranty becomes the point.
Stop at the noun and you get a tidy museum: labels, glass cases, and a gift shop full of really_final.csv snow globes. The useful part appears when the noun does something on purpose, inside a boundary, with guarantees you can take to the bank.
A data product is a bounded-context capability that owns its state and events, exposes intentional interfaces, and upholds explicit guarantees for quality, timeliness, security, and change.
The Noun Is a Start, Not a Finish
Circle the nouns — that's the standard advice, and it's right as far as it goes. Then look around and notice how the same noun means something different depending on who's in the room. "Order" in Commerce is a promise to ship. That same word in Finance is a promise to recognize revenue without going to jail. Without boundaries, nouns wander off and start freelancing — picking up extra fields, losing their meaning, returning home only for quarter close with a suspicious new status code.
That's the gap most data product conversations fall into. They get the naming right and assume the rest follows. It doesn't. A snapshot tells you what is. A product tells you what it does — and holds itself accountable for doing it consistently. Without that accountability — across freshness, completeness, accuracy, privacy, compatibility — you're asking your colleagues to believe in vibes. Vibes are for playlists; enterprises need warranties.
One way I've found useful to think about it: a data product is a sentence you can run in production. The noun is the durable state you steward — Payment Dispute, Customer Consent, Inventory Position. The verb is the capability — reconcile, decide, compute, notify — done intentionally, not accidentally. The guarantees are the adjectives and adverbs: how fresh, how complete, how fast, how often. The interfaces are the prepositions — from where and to whom: SQL views, APIs, event streams. And the punctuation is change itself — versions, deprecations, migration notes — because every clean sentence eventually needs a semicolon.
What This Looks Like in Practice
Let me make it concrete. Marketing wants to personalize an offer. Legal would like you not to personalize your way into a headline. Somewhere between those two forces sits a data product called Customer Consent — and it illustrates exactly why the noun alone isn't enough.
Customer Consent doesn't dump a raw table into a lake and wish you luck. It answers a specific question: May we use X for Y under Z policy? It owns the state of consent decisions, listens for the events that change that state — ConsentGranted, ConsentRevoked, PolicyUpdated — and exposes two clean handles: a fast read (GET /consent?customer_id&purpose) and a warehouse view for audits. Its guarantees are explicit: decisions under 100ms, masks applied by policy CONSENT-42, replayable history within 24 hours. When policies change, it announces a version bump and gives consumers a migration window measured in weeks, not just hope.
That's not "some data." That's a capability with a warranty — and the difference is felt by every team that depends on it.
A Moment from the Trenches
I saw this play out firsthand. The requirement sounded simple: "stand up a streaming feed we can trust." Which really meant: make something moving behave like something owned.
GRC set the ground rules. Anything inventoriable — including data products and event streams — had to be registered in our centralized CMDB with an owner, a classification, a lifecycle state, and controls. So we gave the stream a real identity: a CMDB record, not a wiki page. SLOs that someone would actually measure. A schema fingerprint that changed on purpose, not by accident. We used Confluent Schema Registry for compatibility and evolution, and RBAC for authorization — but the key move was integrating both with our compliance-approved asset-management standard so registry subjects and topics mapped back to CMDB assets and policy. The governed path had to be the only path.
Access wasn't a polite suggestion. We tied rights to roles, not favors. A redaction gate — code, not wishful thinking — scrubbed PII unless the caller was the audit service following its narrow, logged path.
What changed after that wasn't just technical. What used to be "that Kafka thing" became a registered capability · lineage · guardrails · change plan. Security stopped hovering. Audit stopped spelunking. New consumers didn't DM a hero — they subscribed to a product. That shift in how people related to the data was the real signal that something had changed.
Getting There
The first instinct most teams have is to map data products 1-1 to their operational estate — one database table, one CDC stream, one product. That's a reasonable starting point and a trap. A CDC stream from a transactions table is just a feed. Whether it becomes a data product depends on whether anyone has made an intentional decision about what question it exists to answer, for whom, and under what guarantees.
That's the actual work — and it's not a timeline question, it's a thinking question. Event-driven architectures can have dozens of real-time streams in flight simultaneously. None of them are data products by default. They become data products when someone draws a boundary around an outcome: not "here is a stream of payment events" but "here is the current payment standing for a customer, maintained in real time, available to these consumers, under these guarantees." The difference is intent, not technology.
When teams are ready to make that shift, the sequence that tends to work is: start with the domain experts, not the engineers — understand what decisions the data needs to support before sketching a single schema. Then model the events and invariants before touching code. Then lock the interfaces and contracts. Then build the warranty layer — SLOs, lineage, alerts. Then register it and manage it like the product it is. The order matters more than the clock.
This approach draws on DDD & EventStorming for modeling and boundaries, schema evolution and data contracts for safe change, SRE practices for SLOs and error budgets, and catalog/marketplace patterns for discoverability.
The hard part isn't the sequence. It's the organizational will to draw a line and hold it — and the discipline to say "no" to the eleventh "just one more field."
Before You Ship
Two tests worth running before you call something done. The first is what I think of as the completeness check: have you named the noun, shipped a verb, encoded the constraints, modeled the events in and out, and drawn a boundary that's honest about what's in scope — and what isn't? For Customer Consent, that means consent state as the noun, "may we use X for Y under Z?" as the verb, policy CONSENT-42 and auditability as the constraints, ConsentGranted and ConsentRevoked and PolicyUpdated as the events, and a boundary that keeps legal's definitions inside while marketing consumes answers, not raw PII.
The second test is simpler and more honest: if this vanished or missed its guarantees, would a stranger open a ticket? If the answer is no, you've built something interesting but not yet dependable. That's the line between a dataset and a product.
"But We Already Have a Lake…"
Good. Lakes store things. Products ship them. The moment your nouns become capabilities with guarantees, your internal marketplace starts behaving like something people actually want to use — teams can discover, subscribe, and build without archeology. Experiments move faster because the floor is solid and everyone knows what it's made of.
The question "what is a data product?" is worth asking once. After that, it should just be how you work. The day your team stops asking is probably the day it stopped being a definition and became a standard. That's what you're building toward — not a framework, but a habit that holds.
