
Data products are simple: they're nouns with a warranty.
Not grammar-class nouns—budget nouns. Orders, claims, devices, customers. The things that show up on invoices and in audits. But if you stop at nouns, you get a tidy museum: labels, glass cases, and a gift shop full of “really_final.csv” snow globes. Charming. Not useful. The useful part appears when the noun does something on purpose, inside a boundary, with guarantees you can take to the bank.
Here’s a working definition I use in practice:
A data product is a bounded-context capability that owns its state and events, exposes intentional interfaces, and upholds explicit guarantees for quality, timeliness, security, and change.
The noun makes it findable. The capability makes it valuable. The guarantee makes it safe.
Why “find the nouns” is a start, not a finish
You’ve heard the workshop advice: circle the nouns. Good—circle away. Then look around and notice how the same noun means different things to different people. “Order” in Commerce is a promise to ship; “Order” in Finance is a promise to recognize revenue without going to jail. Without boundaries, nouns wander off and start freelancing. They pick up extra fields, lose their meaning, and return home only for quarter close with a suspicious new status code.
Behavior matters, too. A snapshot tells you what is. A product tells you what it does and how often it will do it. And without guarantees—freshness, completeness, accuracy, privacy, compatibility—you’re asking your colleagues to believe in vibes. Vibes are for playlists; enterprises need warranties.
Grammar you can use
Think of a data product as a sentence you can run in production.
The noun is the durable state you steward—Payment Dispute, Customer Consent, Inventory Position. The verb is the capability—reconcile, decide, compute, notify—done intentionally, not accidentally. The adjectives are qualities promised and measured—freshness, accuracy, privacy. The adverbs are cadence and latency—how quickly, how often, with what replay rules. The prepositions are the interfaces—from where and to whom: SQL views, APIs, event streams. And the punctuation is change—versions, deprecations, migration notes—because every clean sentence eventually needs a semicolon.
Tape this next to your coffee: Name the noun. Ship the verb. Publish the guarantees. Respect the boundary.
A day in the life of “Customer Consent”
Marketing wants to personalize an offer. Legal would like you not to personalize your way into a headline. Somewhere between those forces sits a data product called Customer Consent.
It doesn’t dump a raw table into a lake and wish you luck. It answers a specific question: May we use X for Y under Z policy? It owns the state of consent decisions, listens for the events that change that state—ConsentGranted, ConsentRevoked, PolicyUpdated—and exposes two clean handles: a fast read (e.g., GET /consent?customer_id&purpose
) and a warehouse view for audits. These interfaces are illustrative—your surface area may be APIs, SQL views, event streams, or all three—but the point stands: intentional touchpoints with a warranty. Its guarantees are explicit: decisions under 100 ms, masks applied by policy CONSENT-42
, replayable history within 24 hours. When policies change, it announces a version bump and gives consumers a migration window measured in weeks, not hope.
This is not “some data.” It’s a capability with a warranty.
Here’s what that looked like for me in the wild.
A Moment from the Trenches: Turning a Stream into an Asset
The requirement sounded simple enough: “stand up a streaming feed we can trust.” Translation: make something moving behave like something owned.
GRC set the ground rules. Anything inventoriable—including data products and event streams—had to be registered and tracked in our centralized CMDB with an owner, classification, lifecycle state, and controls. So we gave the stream a real identity: a CMDB record (not a wiki page), SLOs someone would measure, and a schema fingerprint that changed on purpose, not by accident. We leveraged Confluent Schema Registry for compatibility and evolution, and the platform’s RBAC for authorization—but crucially, we integrated both with our compliance-approved asset-management standard so the registry subjects and topics mapped back to CMDB assets and policy.
Access wasn’t a polite suggestion. We enforced RBAC at the topic boundary and tied rights to roles, not favors. In the middle sat a redaction gate—code, not wishful thinking—that scrubbed PII unless the caller was the audit service following its narrow, logged path. The effect was immediate. What used to be “that Kafka thing” became a registered capability with lineage, guardrails, and a change plan. Security stopped hovering. Audit stopped spelunking. New consumers didn’t DM a hero; they subscribed to a product.
Noun, capability, guarantees, boundary—the warranty showed up, and the work got calmer.
From dataset to product in about a month
Week one is modeling. Gather the folks who speak fluent reality, sketch the events, draw the boundary, write the invariants in pen. Week two is interfaces and contracts. Choose the surfaces you’ll support (view, API, stream), lock schemas, and publish copy-paste examples a stranger could run. Week three is warranty work. Define SLOs, wire up tests and lineage, add alerts that page humans before Finance does. Week four is change and adoption. Tag v1
, register it in your catalog, announce the deprecation plan, instrument usage, and start the backlog like you mean it.
None of this requires mystical ceremony. It does require taste, repetition, and the courage to say “no” to the eleventh “just one more field.” If you’re wondering where this cadence comes from, here’s the quick provenance:
Notes on this 4-week plan
This is a practical synthesis, not a standard. The cadence (≈ one month / two sprints) draws on:
• DDD & EventStorming for modeling, boundaries, and invariants
• Schema evolution & data contracts for safe change (e.g., registry-backed compatibility)
• SRE practices for SLOs, error budgets, and alerts
• Catalog/marketplace patterns for discoverability and adoption
Your mileage will vary by team maturity—2 to 8 weeks is common.
Two quick tests before you ship
First, the five-check sniff test: you’ve named the noun (state), shipped a verb (capability), encoded constraints (invariants/policies), modeled events (in and out), and drawn a boundary (what’s in scope—and what isn’t).
Second, the ticket test—if this vanished or missed its guarantees, would a stranger open a ticket? If the answer is “no,” you’ve built something interesting but not yet dependable.
Five checks — Customer Consent (example)
Noun: consent state
Verb: answer “may we use X for Y under Z?”
Constraints: policyCONSENT-42
, auditability, masking
Events:ConsentGranted
,ConsentRevoked
,PolicyUpdated
Boundary: legal’s definitions inside; marketing consumes answers—not raw PII
“But we already have a lake…”
Wonderful. Lakes store things. Products ship things. The moment your nouns become capabilities with guarantees, your internal marketplace starts behaving like an app store. Teams can discover, subscribe, and compose without archeology. Experiments get faster because the floor is solid.
Closing thought
Nouns make a clean label. Verbs deliver outcomes. Guarantees earn trust.
Design your data products as capabilities with warranties, and your organization stops spelunking for “the data” and starts composing with dependable pieces. That’s when “what is a data product?” becomes less of a definition and more of a standard you can live with—quarter after quarter, version after version.
(And yes, keep the line: “Data products are nouns with a warranty.” It’s a great opener. Just don’t let it be the whole story.)
Write a comment