We propose a physical model of speech to explain its precision and robustness. We begin by reducing the dynamics to the bare minimum of polygonal billiards. The symbolic stability of the billiard trajectories against variations in action and the oral cavity geometry forms the basis for precision and robustness in articulation. This stability survives forcing and dissipation to underpin reliable encoding of the trajectories into acoustic emissions. The kinematics of oral billiards and the cyclical nature of the forcing mechanism engender a grammar of the syllable independent of any language. The symbolic dynamics of oral billiards is rendered nearly maximally observable by their concomitant acoustic emissions. Speech recognition is the set of computations on the sub-maximally informative acoustic observables from which the symbolic dynamics of oral billiards may be inferred.