Is vision or audition fast enough for proprioceptive augmentation? A quick review

Awareness of body position is afford by proprioceptors, which encode muscle stretch, tendon load, and joint angle. Signals from proprioceptors can take up to 300ms to research the brain and be registered, which is often too slow for the real-time motor control. The brain (so the evidence suggests) compensates for this delay by actively trying to predict the incoming feedback (so it “knows” it ahead of time), which results in the suppression of the predicted proprioceptive signal (Miall and Wolpert 1996; Azim and Alstermark 2015; Tuthill and Azim 2018). The upshot is that proprioception faces serious limitations as an input for the real-time control of complex, subsecond movements. At these short time scales it doesn’t afford real-time awareness of body position.

Temporal delays and suppressed signals aside, proprioceptive signals are impoverished in other ways. They are noisy, imprecise, inaccurate, and their usefulness depends on the accuracy and precision of long-term body models encoding the layout and size of the body. It’s well-known that the brain uses information from other senses, especially vision, to work around these limitations. Visual information is used to update and refine models of the body’s fixed structure and real-time position. Although less prominent, there’s good evidence that auditory feedback plays a role in bodily awareness and movement control as well. For example, playing foot-fall sounds for a person out of sync with their actual running disrupts their kinematics (Kennel et al. 2015). So, bodily awareness and movement control is deeply multimodal (i.e., involving multiple sensory modalities).

A natural question to ask is whether other modalities can be used to work around the temporal limitations of proprioception. Can real-time information about body position be gathered by another sensory modality like vision? Can proprioceptive information lost to suppression from predictive models be gained indirectly through another sense? You might, for example, see your own body as you move in a mirror, or hear the sounds of your moving body (e.g., the sound of foot falls as you run or walk). Since body-movement sounds are subtle, a natural suggestion is to enhance them through movement sonification, i.e. the conversion of body movements into a covarying tone or other artificial sound.

In other to start thinking about whether this sort of sensory augmentation or sensory substitution might help get around the temporal limitations of proprioception, the temporal characteristics of vision and audition need to be explored. Vision or audition could help only if they themselves are fast enough. There are two issues: (1) What is the latency of neural responses? That is, how long does it take each to register a stimulus? (2) What is the temporal resolution? That is, how long must a stimulus last for a change to be detected?

As I’ll set out in this review, audition involves both a shorter latency and better temporal resolution than vision. It’s unclear if either has the needed speed (latency), although generous estimates might lead us to expect that processing is fast enough to allow for at least one real-time correction during a subsecond movement. Audition’s higher temporal resolution and fast, more efficient temporal processing may still hold interesting advantages, especially for motor learning.

Response latency

Reaction time to auditory stimuli tends to be about 30-50ms faster than reaction time to visual stimuli (Freides 1974, p. 290). Part of the issue here may simply be that the physical distance between photoreceptors and primary visual cortex is longer than the physical distance between phonoreceptors and primary auditory cortex (Stauffer et al. 2012, p. 27).

A second relevant fact (perhaps partly explanatory of faster auditory reaction times) is that auditory stimuli elicit meaningful neural responses (specifically, the N1 component of event-related potentials in EEG) in only about 75ms, compared to around 100-150ms for visual stimuli (Stauffer et al. 2012, p. 27). For example, a typical result is that a neural response distinguishing visual presentations of one kind of thing (e.g., animals in photographs) from others (e.g., photographs of landscapes or buildings) emerges about 150ms after stimulus onset (e.g., see Thorpe et al. 1996; see also VanRullen and Thorpe 2001). In contrast, even cortex responsible for fairly late stages of auditory processing (the posterior lateral superior temporal area, downstream from the primary auditory cortex) responds to auditory stimuli within about 30-70ms (Howard et al. 2000, p. 83).

While there is some evidence that visual stimuli can be registered faster (30-60ms after onset), this faster response is limited to specific types of stimuli (e.g., faces) and specific experimental tasks (Braeutigam et al. 2001; see also Kirchner and Thorpe 2006); still, it is a safe assumption that, in the general case, vision takes 100-150ms to extract complex information from retinal activation (and processing seems to continue for several hundred milliseconds). Interestingly, neural responses to auditory stimuli seem to be significantly faster (around 10-30%) for high-frequency tones (4000Hz) vs lower frequency tones (250Hz) (Woods et al. 1993).

A relevant point is that these comparisons may not be apples-to-apples. Measurements showing that visual responses have a latency of about 150ms are measurements of responses encoding complex visual information, such as object category. Fast auditory responses, even in fairly high-level auditory cortex several steps removed from phonoreceptors, may simply be registering comparatively simple or “low-level” auditory information, like pitch. Vision might be able to register comparably simple visual features (e.g., color information) in similarly fast times. A potential worry about ultra fast visual responses is that these responses may not be constitutive of conscious experience, and you may think conscious experience is necessary for real-time motor control (but, maybe not in the subsecond range).

A final point to make on this issue is that proprioceptive augmentation can presumably make do with equally “low-level” features in either vision or audition. While the natural visual information used by proprioception is complex visual form information (e.g., the sight of the body) that takes longer to process, useful information about body position could be artificially provided (for example) via colors that covary with body position.

Is a latency of (say, in the best case) 30-60ms fast enough to afford real-time control of fast, ballistic subsecond movements? Imagine that within 50ms of the movement, your trajectory is off. That information takes, say, another 50ms to be registered in sensory processing. How long will it take the motor cortex to prepare a response? That depends, presumably, on training and anticipation. Training may help speed up the process. A complete guess is that a response will take, say, 100-150ms. That response must then be sent to muscles. Here the latency is probably in the range of 20-40ms (Robinson et al. 1988). That puts us in the range of 250-300ms, plus the time needed for an actual muscle contraction. (I’m also assuming that the time of the augmentation system itself producing the auditory or visual signal is negligible.) So, it might be reasonable to expect (in the best case) one real-time correction in a subsecond movement, or perhaps two or three corrections if the movement extends over a few seconds.

Temporal Resolution

A reoccurring theme in research on temporal perception (the perception of time) is that audition is better than, and dominates, vision.

There is substantial evidence that audition has higher “temporal resolution” than vision, especially in the subsecond range (Stauffer et al. 2012). You are able to hear timing and duration at a more fine level of grain than you can see timing and duration. For example, it is easier to hear which of two very brief, similar duration stimuli (e.g., .5 vs .6 seconds) is longer than it is to see which is longer (Freides 1974, p. 289). Two stimuli separated by a brief duration will be perceived as a single stimulus if the time t between them is too brief, but audition is able to distinguish two separate stimuli for smaller (i.e., shorter) t than vision (Stauffer et al. 2012, p. 20). Interestingly, an oft-noted fact is that the same duration is perceived as longer in an auditory stimulus than in a visual stimulus; perhaps this is due to the higher temporal resolution of audition, which may require a more rapidly ticking internal clock (e.g., see Stauffer et al. 2012). Perception of temporal patterns (beat, or rhythm) is significantly better in audition compared to vision. Evidence for this goes back a long ways (e.g., see Gault and Goodfellow 1938; Freides 1974, p. 295) with recent confirmations (e.g., Grahn 2012).

The dominance of audition over vision in temporal perception can be seen in a few examples. When auditory information about the duration of a stimulus is incongruent with visual information about that duration, the auditory information tends to skew perception of the duration to what was heard (e.g. Lukas et al. 2014). The perceived rate of visual flashes, and their perceived onset time, can be modulated by simultaneously playing a repeating tone of a different rate or onset, but a converse effect of vision on audition is much smaller (Aschersleben and Bertelson 2003). More generally, a preceding or subsequent audio tone makes a seen flash appear earlier or later (respectively) than it actually occurred, although in this case a visual distractor can have a similar affect on the perceived timing of a lone auditory cue (ibid, p. 158). Interestingly, training to improve temporal discrimination within audition transfers to temporal discrimination in vision, but not vice versa (Bratzke et al. 2012). In general, there is lots of evidence that temporal information captured by audition affects the visual experience of time (e.g., Keetels and Vroomen 2010). The natural explanation for this dominance is audition’s higher temporal resolution: Given that we’re better at hearing time than seeing it, the brain relies more heavily on hearing.

Another interesting result is that temporal perception seems to be automatic in audition, but seems to require attention in vision. That is, perceiving temporal durations in vision requires some amount of active effort or cognitive resources. Mioni et al. (2016) showed that even a simple timing-dependent motor task (e.g., repeated finger tapping) disrupts visual perception of duration, but that the same task does not disrupt auditory perception of duration. (Complex sport movements are, of course, timing-dependent motor tasks; so, audition’s ability to process temporal information, e.g. information about body movement timing, independent of motor execution is valuable.)

To summarize, in general, the basic idea is that spatial perception is more accurate in vision, while temporal perception is more accurate in audition. This difference manifests not only in a difference in accuracy in unimodal tasks, but also in the way the senses interact (Freides 1974). For example, the duration of a purely auditory stimulus is more accurately perceived than the duration of a purely visual stimulus, but if the duration of a stimulus is both seen and heard, sensory processing will rely more on audition.

Whether or not proprioceptive augmentation (via sonification or visualization) could enable real-time motor correction during subsecond movements, the finer-grain (and more automatic) temporal processing of audition suggests that it is better for motor learning. Even if you can’t use a signal of body position in real-time, that signal may still provide information needed to improve performance on the next trial. After all, this is what happens in normal proprioception: The proprioceptive signal is too slow to enable real-time control of subsecond movements, but movement errors registered from this (delayed) signal still enable the refinement of motor models and commands that lead to improved performance next time. Similarly, even if an audio or visual signal of body position (afforded naturally or via artificial augmentation) doesn’t enable real-time control, it might still provide useful information for improving on the next trial. The motor cortex can still learn from this signal. Given that auditory temporal information (e.g., about the time course of a movement) is processed at a higher grain of resolution and with less cognitive effort, this suggests that audition provides a better signal for learning than vision.


  1. Azim, E. and Alstermark, B. (2015) “Skilled forelimb movements and internal copy motor circuits”, Current Opinion in Neurobiology, 33: 16-24. 10.1016/j.conb.2014.12.009
  2. Aschersleben, G. and Bertelson, P. (2003) “Temporal ventriloquism: Crossmodal interaction on the time dimension 2. Evidence from sensorimotor synchronization”, International Journal of Psychophysiology, 50: 157-163. 10.1016/S0167-8760Ž03.00131-4
  3. Braeutigam, S. et al. (2001) “Task-dependent early latency (30-60ms) visual processing of human faces and other objects”, Cognitive Neuroscience and Neuropsychology, 12(7): 1531-1536. 10.1097/00001756-200105250-00046
  4. Bratzke, D. et al. (2012) “Perceptual learning in temporal discrimination: Asymmetric cross-modal transfer from audition to vision”, Experimental Brain Research, 221: 205-210. 10.1007/s00221-012-3162-0
  5. Freides, D. (1974) “Human information processing and sensory modality: Cross-modal functions, information complexity, memory, and deficit”, Psychological Bulletin, 81(5): 284-310. 10.1037/h0036331
  6. Gault, R.H. and Goodfellow, L.D. (1938) “An empirical comparison of audition, vision, and touch in the discrimination of temporal patterns and ability to reproduce them”, The Journal of General Psychology, 18(1): 41-47. 10.1080/00221309.1938.9709888
  7. Grahn, J.A. (2012) “See what I hear? Beat perception in auditory and visual rhythms”, Experimental Brain Research, 220: 51-61. 10.1007/s00221-012-3114-8
  8. Howard, M.A. et al. (2000) “Auditory cortex on the human posterior superior temporal gyrus”, The Journal of Comparative Neurology, 416: 79-92. 10.1002/(SICI)1096-9861(20000103)416:1<79::AID-CNE6>3.0.CO;2-2
  9. Keetels, M. and Vroomen, J. (2010) “Sound affects the speed of visual processing”, Journal of Experimental Psychology: Human Perception and Performance, 37(3): 699-708. 10.1037/a0020564
  10. Kennel, C. et al. (2015) “Auditory reafferences: The influence of real-time feedback on movement control”, Frontiers in Psychology, 6: 1-6. 10.3389/fpsyg.2015.00069
  11. Kirchner, H. and Thorpe, S.J. (2006) “Ultra-rapid object detection with saccadic eye movements: Visual processing speed revisited”, Vision Research, 46: 1762-1776. 10.1016/j.visres.2005.10.002
  12. Lukas, S., et al. (2014) “Crossmodal attention switching: Auditory dominance in temporal discrimination tasks”, Acta Psychologica, 153: 139-146. 10.1016/j.actpsy.2014.10.003
  13. Mioni, G. et al. (2016) “The impact of a concurrent motor task on auditory and visual temporal discrimination tasks”, Attention, Perception, & Psychophysics, 78: 742-748. 10.3758/s13414-016-1082-y
  14. Miall, R.C. and Wolpert, D.M. (1996) “Forward models for physiological motor control”, Neural Networks, 9(8): 1265-1279. 10.1016/s0893-6080(96)00035-4
  15. Robinson, L.R. et al. (1988) “Central motor conduction times using transcranial stimulation and F wave latencies”, Muscle & Nerve, 11: 174-180. 10.1002/mus.880110214
  16. Stauffer, C.C. et al. (2012) “Auditory and visual temporal sensitivity: Evidence for a hierarchical structure of modality-specific and modality-independent levels of temporal information processing”, Psychological Research, 76: 20-31. 10.1007/s00426-011-0333-8
  17. Thorpe, S. et al. (1996) “Speed of processing in the human visual system”, Nature, 381: 520-522. 10.1038/381520a0
  18. Tuthill, J.C. and Azim, E. (2018) “Proprioception”, Current Biology, 28: R187-R207. 10.1016/j.cub.2018.01.064
  19. VanRullen, R. and Thorpe, S.J. (2001) “The time course of visual processing: From perception to decision-making”, Journal of Cognitive Neuroscience, 13(4): 454-461. 10.1162/08989290152001880
  20. Woods, D.L. (1993) “Frequency-related differences in the speed of human auditory processing”, Hearing Research, 66: 46-52. 10.1016/0378-5955(93)90258-3