This is like a koan: the performance benefits from field specialization are simultaneously independent of workloads used to generate that specialization and intimately aware of details of the workload it is speeding up.
Recall that field specialization generates, at compile-time or at run-time, code that is tailored to invariants that runs faster. The original DBMS code region where invariants are present is called a specialization opportunity and the code added to the DBMS, a spiff.
Steps in the field specialization process identify the specialization opportunity, the invariant(s) to be exploited, and the location in the DBMS source code where the invariant is realized and the spiff is placed. In a DBMS with millions of lines of code, the tools use dynamic profiles from representative workloads to focus in on particular routines that take a considerable percentage of time. Disparate workloads, different queries within these workloads, and even different scale factors could surface different routines to specialize. For example, we’ve used both standard workloads (TPC-H, TPC-DS, TPC-C) and proprietary workloads in these steps. Thus representative workloads are crucial in identifying specialization opportunities.
Each specialization opportunity has a set of potential specializations, each of which depend on characteristics, such as the number of columns in a table or the distribution of values being joined. (This is the “field” part of specialization: we rely on invariants being ascertained in the field, when a query is run at the end-user site.) Thus the spiff is intimately aware of and exploits details of the actual application workload being executed.
Each specialization is a particular rewritten version of a DBMS routine specialized, as just discussed. The spiff could care less about which workloads were used in identifying that routine. Thus the specialization that results in the performance gain is independent of workloads used to find that opportunity.