The Product Data Challenge
On a large scale, product data has all the ingredients for a nightmare recipe. This is particularly true for retailers.
- Start with massive variation
- Tens of thousands of product types (categories).
- By category, different sets of permissible:
- Attributes (e.g. color, size)
- Attribute Values (e.g. red, 1-1/2”)
- Range limits for the same attributes across categories.
- 1-1/2” is OK as a screw length but probably not for an electrical extension cord.
- Stir in errors and no conventions
- Abbreviations, misspellings, and frequent incorrect values.
- Unlike personal names (First Name, Middle Initial, Last Name), no convention on sequence of information in product names.
- Add in no uniformity
- Can’t tell what’s missing.
- Many attributes don’t apply to all products in a category
- So tough to differentiate between products for an attribute (e.g. product finish or coating) that:
- Should have a value
- Could have a value
- Should not have a value
- Different Organization.
- Different people put the same products into different buckets.
- Where standards exist they’re often not followed or their use is limited.
- Top off with Text
- Typically contains information that should be pulled out as attributes.
- Often rife with stock phrases that help very little to sell.
- Marinate in multiple sources (retailers)
- Integrating product data from different manufacturers, in different formats with different qualities and completeness is difficult.
- Especially when manufacturers categorize products in different ways and none of these categorization schemes (taxonomies) match yours.
- Add your own icing (retailers)
- Manufacturer product descriptions tend to be feature rich and benefit poor.
- And Google is rewarding your own marketing copy (product descriptions).
- But how to write well for 100s, 1000s or even 10000s of products?

Sure you can handle a few products in Excel. But even if you’re Excel wizards like us, these ingredients make Excel:
- Unwieldy for more than 100 products.
- Downright awful for over 1000.
Perhaps you’re a wiz with fancier stuff like regular expressions or scripts? These definitely help but don’t scale well and can’t cope with common problems; e.g. how do you:
- Parse (split) product names into their component parts (attribute values)?
- Figure out when BLK is Black, Bulk, Blank, Block, Blink?
- Automatically fill in missing values?
- Standardize and normalize attribute values?
- Derive new attributes from existing data?
- Resequence product names?
- Write better marketing copy?
The cooking parallel holds. There’s a big difference between cooking for the family, cooking professionally as a chef, and running a foodservice operation.