Product data in the business-to-business marketspace is very different from that in the retail marketspace. One manufacturer may produce (say) some cutting insert. It's sold by multiple distributors, using different product-ids, with prices and other data varying by distributor, buyer, region, etc. Efficiently capturing these business relationships is very involved.
Getting all this right the first time is well-nigh impossible; I started with a simple set of data structures, and incrementally improved them as my understanding of the business issues deepened with hands-on experience with clients and buyers. The final result encompassed over ten different tables in Oracle, tied together with a maze of integrity constraints. And it's still not finished!
The next step was loading product data supplied by clients. I developed a set of procedures and systems for verifying and loading data. This was used to load some 50,000 different products into the tables I designed.
Searching these multitude of products is the next issue. I designed the necessary indexing schemes and search strategies. For example, a full-table scan of text fields might be more efficient than an indexed scan because of disk access patterns. The resultant SQL statements, permitting search by manufacturer, distributor, keyword and product-id were very complex:
SELECT client_id, alias, text FROM pages_aliases tpa, pages, text_long WHERE EXISTS (SELECT * FROM mgh.dists_mfrs_some, mgh.clients_prods tcp, mgh.products tp, mgh.prods_txt tt, mgh.prods_parchild tpc WHERE dist_id = '77586' AND mfr_id = client_id AND prod_id = tp.id AND attrib = 'descr' AND obj_id = tt.id AND LOWER(descr) LIKE '%clamp%' AND LOWER(descr) LIKE '%welding%' AND (EXISTS (SELECT * FROM mgh.dists_mfrs_all WHERE dist_id = '77586' AND client_id = mfr_id) OR EXISTS (SELECT * from mgh.dists_series, mgh.prods_parchild WHERE dist_id = '77586' AND cid = tp.id AND p_alias = page_alias) OR EXISTS (SELECT * from mgh.dists_prods WHERE dist_id = '77586' AND mgh.dists_prods.product_id = tp.id)) AND tp.id = cid AND p_alias = tpa.alias) AND tpa.page_id = pages.id AND attrib = 'tit1' AND obj_id = text_long.id ORDER BY client_id
However, the whole system was designed for efficiency, and could easily handle such complex queries. For example, the above query takes only about 2 seconds to search all the product data.
Back
Rujith de Silva 1997-05-13