Comment on page
Product Data Feed
What is a Product Data Feed?
This is how we get all of the information about our customers’ products. Having it in an automated process allows us to continuously have up-to-date information. This expands our ability to display more specific messages (based off of product scarcity for instance), and allows us to always have accurate product attribute information.
Why is a Product Data Feed needed?
- Crobox needs detailed information about a customer’s products in order to be able to:
- Target messages to certain products
- Have clean, accurate data with a single (or ranked) source of truth
- Be able to drill down into the specifics of the data, even on a product level
- All product attribute messages and almost all of our Behavioral messaging relies on a Product Feed in order to have enough information to be able to accurately display the correct message.
How can a Product Data Feed be set up?
- Crobox supports various existing formats like Google Merchant Center feed, Channable, Productsup. If a customer has such a feed available Crobox can use them as is.
- There are a few options for how data can be delivered to Crobox:
- Periodic fetch* (pull mechanism)
- SFTP upload (push mechanism)
- Manual upload
- Website scraping**
- The following file formats are supported:
- CSV (or similar)
- Customers can have one or more of these options and can choose which Feed should be the source of truth and overwrite the others
* Preferred mechanism and in most cases requires no development work from the customer side.
** Only recommended for augmenting information like product ratings that originate from the website. Not specified further in this documentation.
These settings apply to all feeds;
- Name (required): a name that describes the feed.
- Active: allows for enabling / disabling the feed without deleting it.
- Priority: In the case that multiple feeds are defined that are extracting the same product field (e.g., title) the feed with the lowest number will be used for that field. (Lowest priority = first preference).
- Incremental: When selected all fields will be augmented instead of replaced.
Crobox supports fetching urls using a cron interval. This is a pull mechanism that automatically runs on a configured interval and supports various URL standards.
Only 2 additional configuration settings are required:
- e.g. Every night at 2:00AM, or every 4 hours during weekdays
- Can by either HTTP(S) or (S)FTP URL; Basic authentication is also supported
Crobox exposes an SFTP service that can be used to (automatically) upload feeds. This allows customers to push the data on their preferred timing instead of having a fixed interval configured. (Note that it is preferred to set this up in an automated way that could require additional development time on the customer side.)
Only 1 additional configuration setting is required:
- File name pattern
- A unique file name pattern is required with the name of the file that is uploaded. If the file name doesn’t match the file that is uploaded the upload is denied. This is so we can correctly identify which file should be connected to each feed.
- E.g. If file pattern is set to .csv.gz , the upload will match files such as feed.csv.gz and other.csv.gz but not feed.xml
Manual upload can be used in the Crobox interface to upload a feed that contains data that doesn’t change often. This allows for an easy way of augmenting product data.
No additional configuration settings are required.
For all file formats compression is supported and gzipped or zipped files are automatically deflated and encoding is configurable (currently UTF-8 and ISO-8859-1 are supported but more can be added on request)
Csv format should contain data in columnar format. Each row must contain a uniquely identifiable product / variant.
The following configuration settings are available:
- Header as first row
- Instead of using index-based columns the first row of the file contains the names of the columns
- Configures what delimiter is used for separating the values. Available are
- Comma character (,)
- Tab character (\t)
- Semicolon (;)
- Space( )
Xml format should contain data with different levels of indentation / nesting. Crobox supports an XPath like selector mechanism to extract different field values.
The following configuration settings are available:
- The path should contain the element names that identifies a single product in the XML feed. A commonly used format is RSS 2.0 for XML feeds; In this case taking the following example, the value of path should be rss channel item to uniquely identify a single product
<?xml version="1.0" encoding="utf-8" ?>
The Crobox product data model is flexible and extensible so it can be easily connected with various feeds.
Only the Product entity is actually required as bare-minimum with product ID as the only required field, but ideally the following fields would be included:
- Product Name
- Product ID
- Product Image URL
- Product Description
- Product PDP URL
- Category information (product type, gender, etc) - This should ideally include all of the categorization that will used for the Product Finder questions/answers
- Stock information
Crobox Data model is extensible with custom properties that a Product or Variant can define. Those fields can be of type
Field Value Conversions on Feed imports
Crobox supports various ways to ‘clean’ the data that is provided in the feeds. This allows a flexible connection between the source and target data model.
- Boolean (including trueish value configuration)
- Number (including division and multiplication)
- Date (including format mechanism)
- String (including splitting and joining)
- Regex (including support for capture groups)
- Lookup Table (key-value mappings with default value for no match)