I vote no.
The word "annotation" is too vague; are you proposing text that:
1. always faces the viewer
2. is always the same size on any "screen"
3. is not occluded by other objects in the world
All of these bring up sticky issues:
1. Why is "always faces the viewer" a property of text? Why not just add a
FaceTheViewer transformation that assures that the coordinate system after
the FaceTheViewer is aligned with the camera's coordinate system? What if
there are multiple cameras, either because a stereo projection is being
performed or because the virtual world is being projected onto multiple
physical screens (ala the CAVE or big-screen vis sim setups)?
2. Ditto for "always the same size"-- why not a kind of scale node that says
"I want distance 1.0 in object space to map into a distance of XXX
millimeters on the screen"?
3. What should browsers that support stereo views do for objects that aren't
occluded by other objects? Conflicting depth cues seems like a sure way to
get a massive headache...