Perceptual Content, Not Physiological Signals, Determines Perceived Duration When Viewing Dynamic, Natural Scenes

The neural basis of time perception remains unknown. A prominent account is the pacemaker-accumulator model, wherein regular ticks of some physiological or neural pacemaker are read out as time. Putative candidates for the pacemaker have been suggested in physiological processes (heartbeat), or dopaminergic mid-brain neurons, whose activity has been associated with spontaneous blinking. However, such proposals have difficulty accounting for observations that time perception varies systematically with perceptual content. We examined physiological influences on human duration estimates for naturalistic videos between 1–64 seconds using cardiac and eye recordings. Duration estimates were biased by the amount of change in scene content. Contrary to previous claims, heart rate, and blinking were not related to duration estimates. Our results support a recent proposal that tracking change in perceptual classification networks provides a basis for human time perception, and suggest that previous assertions of the importance of physiological factors should be tempered.


Introduction
Duration perception in the range of seconds to minutes is an essential feature of cognition and behaviour; however, its precise underlying neural mechanisms are still unclear.Most accounts rely on the assumption that there is a mechanism directly mapping physical time into perceived time, i.e. a neural ' clock' or 'pacemaker' (Block & Zakay, 1996;Matell & Meck, 2004;Treisman, 1963;Treisman, Faulkner, Naish, & Brogan, 1990).Among these, the prominent pacemakeraccumulator model proposes that duration perception arises from a process wherein a pacemaker generates sequential neural pulses that are stored in an accumulator: according to this suggestion, the number of pulses accumulated over a certain interval constitutes the brain's estimation of the duration of that interval (Church, 1984;Treisman et al., 1990).A variation of this model proposed that perceived duration depends on the functioning of multiple neural oscillators, each with phasic activity operating on different timescales (Matell & Meck, 2004;Mauk & Buonomano, 2004;Treisman et al., 1990;van Rijn, Gu, & Meck, 2014).
Several studies have now linked levels of striatal dopamine to duration perception (Allman & Meck, 2012a;Coull, Cheng, & Meck, 2011;Coull, Hwang, Leyton, & Dagher, 2012;Matell & Meck, 2004;Meck, 2006;Soares, Atallah, & Paton, 2016).Specifically, increased dopaminergic activity has been related to overestimation of duration, and vice versa, for intervals in the region of one second (Coull et al., 2011;Terhune, Sullivan, & Simola, 2016).This has led several researchers to propose a fundamental role in time perception for neural oscillators that form a part of the ascending nigrostriatal dopamine pathway of the dorsal striatum (Coull et al., 2011;Matell & Meck, 2004;Meck, 2006;van Rijn et al., 2014).Taking advantage of the link between increased striatal dopamine and spontaneous blinking (Groman et al., 2014;Karson, 1988), Terhune and colleagues reported an apparent behavioural correlate of the influence of dopaminergic activity in human reports of duration.They presented evidence for transient variations in duration estimation in the sub-and supra-second range, with participants systematically biased towards reporting durations as longer immediately after spontaneous blinking, as compared to when no prior blink was present (Terhune et al., 2016).This finding was interpreted as a demonstration of dopaminergic influence on neural ' clock speed' or temporal attention, implying that the results provided evidence in favour of the existence of such a clock underlying duration perception (Terhune et al., 2016).
Alternative accounts for duration perception suggest that, rather than being (primarily) internally-driven by neural or physiological clocks or other rhythmic processes, the basis for duration perception lies in changes in perceptual content (Herbst, Javadi, Meer, & Busch, 2013;Kanai, Paffen, Hogendoorn, & Verstraten, 2011;Linares & Gorea, 2015;Ornstein, 1969).Under these accounts, measurable exteroceptive stimulus attributes, rather than bodily signals, should best correlate with duration estimates.This simple approach has often been dismissed because changes in internal states, such as arousal or attention, can lead to the exact same perceptual content (stimulus) being reported as different in duration (Block & Zakay, 1996;Brown, 1985;Polti, Martin, & van Wassenhove, 2018;van de Ven, van Rijswijk, & Roy, 2011;Zakay & Block, 2004).However, a recent development (Roseboom et al., 2019) of this simple idea suggests that it is not how much perceptual stimulation itself changes that is key, but rather how much change in neural activity occurs in perceptual classification networks, in response to perceptual stimulation, that provides a basis for duration perception.This distinction allows changes in stimulation to drive changes in neural activity (based on changes in perceptual content), but further, that this neural activity is constrained by the state of the perceptual system during estimation, allowing for fluctuations in, for example, attention to time (or other task-dependent features).
A computational model based on this proposal successfully predicts human duration estimations for a wide range of durations and types of perceptual content (Roseboom et al., 2019).Using an artificial perceptual classification network, the model produced duration estimates from input videos of natural scenes (1-64 seconds duration) based on frame-by-frame changes in network activation patterns in response to video content.Model-produced estimates were well-matched to human estimates made regarding the exact same scenes.Moreover, model estimates replicated qualitative biases in human reports by scene, with busy scenes (e.g., walking around a city) judged as longer in duration than less busy scenes (e.g., walking in the countryside or sitting in an office).Model performance was further improved when based on content from the area in the scene where human participants were looking (based on human gaze data), as opposed to the entire screen.That model estimates produced these qualitative biases provided strong evidence for a basis of human duration estimation in the dynamics of perceptual classification.Crucially, in achieving this outcome, the model made no use of autonomic or oscillatory neural processes.
Accompanying the behavioural reports of apparent duration by human participants, Roseboom et al (2019) recorded where in the scene participants were looking (using eye-tracking) and monitored their heartbeat (using blood-volume pulse measurements) throughout the experiment.In Roseboom et al., (2019), only the behavioural reports and computer modelling results were presented; results in the present study come from analyses conducted on the combination of behavioural reports, eye-movements (saccades, blinks, and pupil size), and heart rate data.

General Aim and Rationale of Analyses
Unlike most studies on duration perception, which typically use simple stimuli (circles on a screen (Terhune et al., 2016), auditory tones (Meissner & Wittmann, 2011)), or in some cases specifically interoceptive stimuli (Lernia et al., 2018), the present study used complex, naturalistic visual scenes, closer to everyday phenomenological experience, in which complexly changing visual information is prominent.In combination with the eye-tracking and heartbeat data, this dataset allows investigation of the different proposed contributors to human duration estimation; in particular a contrast between external (stimulus change) and internal (potentially striatal dopaminergic activity and fluctuations in heart rate) components.
We hypothesized that time perception would arise from changes in perceptual content and therefore, under exposure to naturalistic visual stimulation, the influence of internal processes would be negligible.To investigate this proposal, we analysed the participants' data, partially presented in (Roseboom et al., 2019), to look for associations between duration estimates and external and internal events, specifically: i) visual content and saccades, ii) autonomic fluctuations and iii) dopaminergic phasic activity.The previously published analyses in Roseboom et al. (2019) presented only participants' duration estimates and no other physiological or eye behaviour data.

Perceptual Change, Eye Movements and Duration Estimation
First, we looked at the relationship between visual content and duration estimation.An association between visual content (scene type: city, campus and outside, and office or café) and duration estimation has previously been reported for this data, with scenes with greater perceptual change driving longer estimates (Roseboom et al., 2019).This result is in agreement with the proposal that time perception is related to changes in perceptual content.
However, this association could instead arise from a potential relationship between visual content and eye movements -specifically saccade density, defined as the number of saccades per second.Previous evidence indicates that stimulus-driven factors (Itti, 2005;Mital, Smith, Hill, & Henderson, 2011) influence eye movements, with results often consistent with minimising sensory prediction error (Friston, Adams, Perrinet, & Breakspear, 2012;Gottlieb, Oudeyer, Lopes, & Baranes, 2013;Itti & Baldi, 2009;Tatler, Hayhoe, Land, & Ballard, 2011).In the context of a dynamic video, sensory prediction error might be, naïvely, expected to correlate with perceptual change between one time point and the next.A simple hypothesis might then be that saccade density would be greater for videos with greater perceptual change, showing a similar dependency of subjective duration as shown for visual content.This pattern of results would imply that an account of subjective time perception based on changes in perceptual content could instead be based on something far more trivial: tracking stimulus-driven eye movements.Aiming to rule out such trivial alternative interpretations, we analysed the threefold relationship between visual content, saccade density and duration estimates.

Cardiac Activity and Duration Estimation
As mentioned, it has long been suggested that repetitive physiological signals may form the basis for human time perception.Craig (Craig, 2009a) postulated a key role for the accumulation of interoceptive visceral information in the anterior insula, which has been related to integration of autonomic signals and interoceptive awareness (Craig, 2009b;Nguyen, Breakspear, Hu, & Guo, 2016).An fMRI study by Wittmann and colleagues (Wittmann et al., 2010) identified a pattern of accumulating neural activity in the bilateral posterior insula during the encoding of time intervals in the range of seconds.The authors suggested that this accumulation corresponded to a clock-type (pacemaker) pattern, recording sequential physiological states during the attended interval (Wittmann et al., 2010).Another study (Meissner & Wittmann, 2011) examined the evolution of cardiac activity throughout the encoding of time intervals and found a progressive increase in cardiac periods (slowing down of heart rate).Furthermore, in this study, individuals' duration reproduction accuracy correlated positively both with the slope of cardiac slowing during encoding and with interoceptive accuracy, measured in a heartbeat counting task.The authors reasoned that such heart rate slowing could correspond an accumulation of parasympathetic activity, consistent with the proposed insular pacemaker.
In light of these claims, we looked for similar associations between cardiac activity and duration estimation or accuracy in our own data by assessing both average heart rate and progression of heart rate during video presentation.As an additional marker of autonomic activity, we also examined pupil size (Bradley, Miccoli, Escrig, & Lang, 2008;Laeng, Sirois, & Gredeback, 2012;McDougal & Gamlin, 2015) -presented in the Supplementary Materials.Information on interoceptive accuracy or awareness was not collected in our experiment.

Dopaminergic Activity Indexed by Spontaneous Blinking and Duration Estimation
Finally, we examined evidence for the proposed association between striatal dopaminergic activity and duration estimation (Coull et al., 2011;Matell & Meck, 2004;Mauk & Buonomano, 2004), following Terhune and colleagues' approach (Terhune et al., 2016) of employing spontaneous blinking as a proxy for transient fluctuations in striatal dopamine (Groman et al., 2014;Karson, 1988).Terhune and colleagues claimed that subjective duration reports in both the sub-and suprasecond range (300-2600 milliseconds) were longer when the participant had blinked immediately before the trial (Terhune et al., 2016).In their experiment they employed simple auditory (white noise bursts) and visual stimuli (circles on a screen).We therefore decided to investigate whether the same association (particularly for the shorter videos, 1-3 seconds) was present while viewing dynamic, naturalistic visual stimuli.

Methods
The methods concerning data collection are as reported in (Roseboom et al., 2019); data analysis techniques are specific to the present study.Human participants watched a series of silent videos containing natural scenes with different amounts of 'liveliness' -which translated to different amounts of perceptual change.Their task was to report the estimated duration of each presented video.Eye-tracking recorded participants' pupil size, gaze fixation, saccades and blinks.Blood-volume pulse recordings allowed us to measure cardiac activity.

Participants
Fifty-five human participants (40 female, average age 21.4) took part in the experiment.All were recruited from the University of Sussex, over 18 and reported normal or corrected-to-normal vision.They provided informed consent and were awarded with course credits or, alternatively, £5 per hour for their participation.The study was granted ethical approval by the Research Ethics Committee of the University of Sussex.
Three (female) participants were excluded from the analysis because eye-tracking was not successfully recorded or substantially missing.After exclusion of those subjects, the remaining dataset totalled 4060 trials.

Stimuli
Stimuli were based on videos obtained in the city of Brighton (United Kingdom), the University of Sussex campus, and the local countryside.These videos were recorded with a GoPro Hero 4 camera at 60 Hz and 1920 × 1080 pixels, and further processed at 30 Hz and 1280 × 720 pixels.The brightness of the videos was not controlled.The stimulus employed in each experimental trial was extracted from a pseudo-random list of 4290 video fragments, comprising 330 repetitions of each of 13 durations ranging from 1 to 64 seconds (1, 1.5, 2, 3, 4, 6, 8, 12, 16, 24, 32, 48, 64 s).
The videos could be classified into three video types in terms of content: videos recorded while walking around the city, scenes recorded while walking around the campus and outside in the campus green zones, and quiet scenes in an office or café.Each video type has a greater amount of perceptual change than the next.Perceptual change can broadly be defined as the amount of change between consecutive video frames -more complex and dynamic videos will have greater perceptual change than static scenes involving inanimate objects.Although used only qualitatively in the present paper, the construct validity of 'perceptual change' for these stimuli was previously confirmed (Roseboom et al., 2019) by feeding the same videos into an image classification algorithm and quantifying the frameby-frame rate of change at different hierarchical levels of visual information: from pixel-wise to object-level changes.Within each layer of the image classification network, frame-by-frame change was calculated as the Euclidean distance between the neural activation pattern for one video frame and the next; whenever it exceeded a dynamic 'saliency' threshold, perceptual change was deemed to have occurred and one 'change unit' was accumulated over time.For all video durations the network determined that perceptual change, thus computed, was greater for city videos, intermediate for campus and outside scenes, and lower for office and café (Roseboom et al., 2019).Specifically, duration estimates produced by the network (which were a transformation of the accumulated perceptual change, converted from arbitrary 'change units' into seconds by support vector regression) deviated from the mean by +24%, -4% and -7% for city, campus/outside and office/café videos, respectively.

Apparatus
Experiments were programmed in MATLAB 2012b (MathWorks Inc., Natick, US-MA), employing Psychtoolbox 3 and the Eyelink Toolbox, and presented on a LaCie Electron 22 BLUE II 22" with screen resolution of 1280 × 1024 pixels and refresh rate of 60 Hz.Eye tracking was performed with Eyelink 1000 Plus (SR Research, Mississauga, Ontario, Canada) at 1000 Hz sampling rate, using a desktop camera mount and a chin and forehead rest to stabilize head position at 57 cm from the screen.Calibration of the eye-tracking system was performed at the beginning of each 20-trial block, with a standard 5-point grid and a maximal average error of 0.5 degrees of visual angle (dva).The thresholds for saccade detection were: 0.15 dva motion, 22 dva/s speed, 4000 dva/s 2 acceleration.Blood-volume pulse (BVP) measurements were obtained using a BVP-Flex/Pro (9308M) sensor and FlexComp System (T7550M) from Thought Technology (Montreal, Quebec, Canada).Heartbeats were detected by applying a peak detection algorithm on the BVP data.

Procedure
The experimental session lasted for one hour and typically comprised 80 trials completed in four 20-trial blocks.As detailed in (Roseboom et al., 2019), for logistical reasons some participants did not complete all 80 trials.The specific trials assigned to each participant were randomized, and neither their content nor duration was balanced or constant across participants.However, all subjects watched at least one video of each of the 13 video durations -except one subject who lacked trials with three durations.The raw trial-by-trial data for each participant is available in the Supplementary Material.The task required a report of the estimated duration of each video in seconds, performed immediately after the video ending by using a visual analogue scale.
In our analyses we separately employed two dependent variables related to duration estimation: responses and error sizes.'Response' was the estimate of duration (in seconds) provided by the participant in a given trial and was analysed for potential under or overestimation of time intervals in different conditions.'Error size' represented the amount of deviation of responses from veridical magnitude, regardless of direction; it was calculated as the absolute value of the relative error, i.e. error size = |response-duration|/duration.Both measures, but critically error size (inaccuracy), depend not only of time perception but also more broadly on general processes involved in cognitive tasks (Livesey, Wall, & Smith, 2007).In our analyses, we systematically tested the potential association between different external and internal factors (scene type, saccades, pupil size, blinking, cardiac activity) with these two dependent variables, as discussed in the Introduction -see 'General Aim and Rationale of Analyses'.
All tested factors were measured trial-wise.Saccade density and average heart rate were computed over video presentation.Heart rate progression through video presentation was calculated as described by Meissner and Wittmann in (Meissner & Wittmann, 2011): cardiac (inter-peak) periods were resampled at 5 Hz using cubic interpolation, averaged on a second-by-second basis and normalized per participant.The slope (linear regression coefficient) of the time series of cardiac period progression indicated the progression of heart rate throughout each trial's video presentation: a positive slope implies that heart rate progressively slowed down (inter-peak periods increasing with seconds since video onset) and vice versa.
Pre-trial blinking was assessed for the 2000 ms immediately preceding trial onset.In separate analyses, only the period between 2000 ms and 1000 ms prior to trial onset was considered.The purpose of the second analysis, following Terhune and colleagues (see 'Supplemental Experimental Procedures' in (Terhune et al., 2016)), was to prevent confounding effects by saccades elicited by blinks around stimulus onset, as saccades are known to affect time perception (Grossman, Gueta, Pesin, Malach, & Landau, 2019;Yarrow, Haggard, Heal, Brown, & Rothwell, 2001).We considered that a blink was present when the ' onset' of a blink was identified in the eye-tracking recording within the period of interest.Thus, we defined two binary variables based on the occurrence (or lack) of a blink within those two time ranges: 2000 ms to trial onset and 2000 -1000 ms before trial onset.For brevity we will refer to these binary variables as B2000 and B1000, respectively.
Both dependent and quantitative independent variables were normalized within participant and video duration to make them independent of both these factors.Thus, normalized responses represent trial-by-trial variability of the estimates provided by the same participant for that specific duration.The other normalized variables are interpreted in an analogous manner.
Associations between (normalized) internal and external factors and duration estimation and accuracy were explored both for all video durations pooled and for each duration separately, considering that duration perception might depend on different mechanisms for shorter ( ~1 s) and longer ( ~1 min) intervals (Wiener, Turkeltaub, & Coslett, 2010).Results for individual video durations are presented in section S3 of the Supplementary Materials.
For Bayesian statistical analyses we employed the default JASP priors: for t-tests, a prior distribution Cauchy (0, √1/2); for Pearson correlations, a uniform distribution U(-1,1); for ANOVAs and repeated-measures ANOVAs, r scale prior width of 0.5 for fixed effects and 1 for random effects.The wording employed for describing the amount of evidence indicated by the Bayes factor corresponds to that suggested by Lee and Wagenmakers (Lee & Wagenmakers, 2013).We consider that evidence in favour of the alternative hypothesis is more than anecdotal when BF 10 > 3; conversely, there is more than anecdotal evidence in favour of the null when BF 10 < 1/3 (equivalently BF 01 > 3).
The main analyses were concerned with independent hypothesis testing for the potential effect of each considered factor on duration perception.However, in the Supplementary Materials we report the effect size of each factor by means of Bayesian multilevel regression.See section S1 of the Supplementary Materials.

Perceptual Change, Eye Movements, and Duration Estimation
Perceptual Change and duration estimation Figure 1a depicts participants' responses per video duration.As expected, veridical duration is strongly associated with subjective duration estimates, albeit showing an apparent effect of regression to the mean in responses (often referred to in the temporal domain as Vierordt's law) as suggested by a slope of less than one between veridical and estimated durations.
As reported in (Roseboom et al., 2019), videos with greater perceptual change were estimated as longer in duration, such that, in terms of scene type, duration reports followed the order: city > campus/outside > office/café.Specifically, in terms of z-scores (normalized reports within participant and video duration), average responses for video type were city z = 0.14, campus/outside z = 0.036, office/café z = -0.091.We ran a Bayesian ANOVA on the effect of scene type on duration estimates and found extreme evidence in favour of such effect, with a Bayes factor BF 10 = 8.561 * 10 6 .Figure 1b depicts the average and 95% credible intervals of the normalized responses per scene type.Examining all pairwise comparisons between scene types, we found only anecdotal evidence for a difference in responses between city scenes and campus/outside (BF 10 = 1.378), but at least very strong evidence for a difference between the other pairs: BF 10 = 3.621 * 10 7 for city versus office/café, BF 10 = 34.789for campus/outside versus office/café.

Saccade density and perceptual change
We ran a Bayesian ANOVA to test for differences in (normalized) average saccade density by video type (urban, campus/outside, office/café), obtaining extreme evidence for difference between all three types (BF 10 > 100).However, the association between saccade density and perceptual change was not straightforward, as average saccade density was highest for city scenes (z = 0.218), smallest in campus/outside scenes (z = -0.216),and intermediate for office/café scenes (z = 0.012) (Figure 2b).
These results rule out trivial interpretations of the interaction of perceptual change and duration estimation in terms of differences in saccade density.Specifically, the relationship between perceptual change and duration estimation cannot be explained simply by tracking stimulus-driven eye movements, because saccade density is non-monotonically related to duration estimation (compare Figures 1B and 2B).This interpretation is in line with the modelling results reported in Roseboom and colleagues' study (Roseboom et al., 2019).

Saccade density and duration estimates
We performed a Bayesian bivariate correlation between trial-by-trial normalized saccade density and response -see scatter plot in Figure 2D.Pearson's coefficient and 95% credible intervals were r = 0.042 (0.011 -0.073), suggesting a weak positive correlation; the Bayes factor was insensitive, anecdotally in the direction of the null hypothesis: BF 10 = 0.621.Thus, there was no support for either the existence or absence of a correlation between saccade density and duration estimates, when considering the entire samplea result consistent with the non-monotonic relationship between saccade density and video type reported above.
We further assessed whether an association between saccade density and responses could be present only for certain video durations and split the dataset according to the latter variable.We ran a Bayesian correlation between normalized saccade density and response for each of the 13 resulting datasets.Only for the longest duration (64 seconds) did we find evidence for an effect of saccade density on response (r = 0.190, BF 10 = 16.879)-a positive correlation -though the meaning of this isolated finding is unclear.Scatterplots for all separate durations are presented in the Supplementary Materials, section S3.
Since the scatterplots for the performed correlations are difficult to interpret visually, we also provide a plot (Figure 2C) showing the pattern of descriptive results for the relationship between trial-by-trial saccade density and duration estimates.The error bars represent normalized responses per video duration, split into trials with below and above-average saccade density, compared with other trials from the same participant and with the same duration.No overall pattern is observed that in any way suggests a different distribution of reports for trials with different saccade density.Note that the dichotomization of saccade density has been adopted for graphical purposes only and all statistical analyses employ saccade density in its continuous, non-dichotomized form.
In summary, our data suggests a relationship between perceptual change and duration estimates, as well as (less straightforwardly) between perceptual change and saccade density.However, there is no clear global effect of saccade density on duration estimates.

Saccade density and accuracy
In a Bayesian bivariate correlation between normalized saccade density and error size (Figure 2F), Pearson's correlation coefficient and 95% credible intervals were r = -0.061(-0.093, -0.030), with the Bayes factor indicating strong evidence in favour of the existence of a correlation: BF 10 = 29.07.The negative sign of the correlation indicates that participants were more accurate (made smaller errors) in trials where their saccade density was greater.Considering that saccade density was not directly associated with duration estimates, it seems likely that its correlation with accuracy does not necessarily imply a link between saccades and the neural basis of time perception.Instead, it could indicate an unspecified effect on task performance -perhaps as an index of the level of attention or engagement with a visual task.

Cardiac Activity and Duration Estimation
Heart Rate and Duration Estimates The trial-wise relationships between both mean heart rate and response, and mean heart rate and error size were assessed by two Bayesian bivariate correlations.In both cases we found very strong evidence against the existence of a correlation.Concerning response (Figure 3B), the Pearson's correlation coefficient and 95% credible intervals were r = 0.008 (-0.025, 0.042), with a Bayes factor BF 10 = 0.024.Results for accuracy (Figure 3D) were r = -0.013(-0.047, 0.020), BF 10 = 0.029.This absence of any clear overall relationship between mean heart rate and either response or error size is evident in Figures 3A

Heart Rate Progression Heart Rate Progression Throughout Video Presentation
After obtaining normalized second-by-second cardiac periods for each trial, as described in the Methods section, we split the dataset into 11 video durations -only considering durations over 2 seconds in order to allow second-bysecond cardiac progression to be calculated.We tested whether there was any effect of time since video onset on cardiac periods by Bayesian repeated-measures ANOVA, run separately for each video duration.The dependent variable was the average normalized cardiac period measured at each second since video onset.The only within-participant factor was time (in seconds) since video onset.Thus, the ANOVA had as many levels as seconds of video duration.Evidence for an effect of time from onset on second-bysecond cardiac periods was at least moderate (BF 10 > 3) for videos of 4s, 8s, 12s and 48s, with BF 10 = 23, 95589, 230 and 11, respectively.Conversely, there was also at least moderate evidence for the null (no different cardiac periods at different time points, with BF 10 < 1/3) for 2s, 3s, 6s, 16s, 32s and 64s. Figure 4 presents the average cardiac periods and 95% credible intervals at each time point (seconds since onset) for six different video durations.The ascending slope indicates that the effect of time (when it exists) involves a slowing-down of heart rate throughout the presentation of the video (i.e.increasing cardiac periods), consistent with the results reported by Meissner and Wittmann during the ' encoding phase' of time intervals of 8, 14 and 20 seconds (Meissner & Wittmann, 2011).As shown in lower panels of Figure 4, the deceleration of cardiac periods reaches a plateau after the first 5-10 seconds; this stabilization might be the reason why weak or no evidence in favour of the alternative hypothesis is found for the longest video durations.

Heart rate progression and duration estimates
Our finding of a reduction in heart rate throughout video presentation is in agreement with the results reported by Meissner and Wittmann (Meissner & Wittmann, 2011), who additionally found a positive correlation between heart rate reduction and accuracy in reproduction of the presented interval.This reduction was explained in terms of a hypothetical pacemaker-type accumulating mechanism related to an increase in parasympathetic activity that would track duration perception.We sought to test this hypothesis in our own data by analysing the relationship between cardiac period slope (linear slope for the second-by-second progression in cardiac periods throughout the trial) and duration estimation (response) and accuracy (error size).In a Bayesian bivariate correlation between cardiac period slope and response we obtained a Pearson's r = -0.030with 95% credible intervals (-0.066, 0.005), and a Bayes factor BF 10 = 0.094, indicating strong evidence against the existence of a correlation (Figures 5A-5B).We approached the relationship between heart rate progression and accuracy in two ways: First, by analysing trial-wise associations within each participant, and second, following Meissner and Wittmann (Meissner & Wittmann, 2011), by looking for differences in participant's average cardiac period slope between good and poor duration estimators.

Trial-by-trial association between cardiac period slope and accuracy
Bayesian correlation between trial-wise cardiac period slope and error size yielded Pearson's r = 0.005 (-0.030, 0.040), with Bayes factor (BF 10 = 0.023) indicating very strong evidence against any effect of cardiac period slope on accuracy on the overall dataset (Figures 5C-5D).Similar results were obtained when assessing trials of each video duration separately: the Bayes factor indicated at least moderate evidence against the existence of a correlation (BF 10 < 1/3) for all video durations, with the exception of 2-second videos, where evidence for the null was anecdotal (BF 10 = 0.693, with a Pearson's coefficient of -0.277 (-0.548, 0.073)).See Supplementary Materials, section S3, for further detail.

Cardiac period slopes in good versus poor performers
Although we failed to find an association between heart rate progression and accuracy in duration estimation on a trial-by-trial basis, we enquired whether this association could be present at the participant, rather than the trial level.We classified participants into good or poor performers, depending on whether their average error size was below or above the sample median.We analysed the effect of time since video onset on cardiac period progression in good and poor performers, considering each participant's time-series of average second-by-second cardiac periods for all 11 video durations over 2 seconds, taken separately.We performed a Bayesian repeatedmeasures ANOVA for each duration -Figures 5E-5H show four of them (8, 12, 16 and 24s -the periods most similar to those used in Meissner & Wittmann, 2011).The dependent variable was the average cardiac period at each time point (seconds since video onset).We employed time since onset as the only within-participant factor (with as many levels as seconds of video duration) and the binary classification on performance (good or poor) as betweenparticipant factor.If there was any difference in cardiac period progression between good and poor performers, we should find evidence for the inclusion of the interaction term (time since onset * performance) in the model.In all analyses, the best model was either the one containing only time from onset, or the null model.This indicated that in some cases (specifically for 4, 8, 12 and 48 seconds) there was an effect of time since video onset on second-by-second cardiac periods, while in the remaining cases there was no evidence for a time-related difference.In any case, neither the cardiac periods at each time point (main effect) nor their progression throughout the trial (interaction with time) were different between good and poor performers.The Bayes Factor for inclusion of the interaction term (the one relevant to our question) was always at least moderately against its inclusion (BF inclusion < 1/3), except for duration of 3 seconds (BF Inclusion = 0.367).Furthermore, it was extremely against inclusion (BF inclusion < 1/100) in all cases except for 2-4 and 8 seconds.Thus, evidence consistently opposed any supposed difference in cardiac period progression during video presentation in good versus poor performers in a duration estimation task.
In Supplementary Materials (S2) we explore the association between another autonomic marker, pupil size, and duration estimation.As with cardiac activity, we find no association with either the magnitude or the accuracy of duration estimates.

Spontaneous Blinking and Duration Estimation
Trials were dichotomized according to the presence or absence of blinking in the 2000 ms leading to video onset (B2000), or in the pretrial period 2000-1000 ms (B1000).Considering that for long video durations pre-trial blinking may not be an accurate index of dopaminergic activity throughout the entire interval, we restricted the analyses to durations up to 3 seconds -comprising 1248 trials in total.Responses for those durations, split according to presence or absence of a pre-trial blink, are presented in Figure 6.We Performance classification was made by calculating the average error size (regardless of direction) of each participant and dividing participants according to whether their average error size was above or below the sample median.
ran a Bayesian one-way ANOVA on short-video trials (1-3 s duration), with normalized response as dependent variable and B2000 as fixed factor.We repeated the same analysis using B1000 as fixed factor instead.The Bayes factor indicated strong evidence against the effect of B2000 on duration estimates (BF 10 = 0.068), and moderate evidence against B1000 (BF 10 = 0.150).

Discussion
This study aimed to shed light on the mechanisms of human duration perception by assessing the contribution of internal physiological and neural signals to the biases in human duration estimates for naturalistic videos.As previously reported (Roseboom et al., 2019), human participants' estimates were longer for more dynamic videos, such as city scenes, than for less dynamic videos of a quiet office.That study presented a model of human duration perception that could reproduce these human biases by computing perceived video duration on the basis of frame-by-frame perceptual change, without appealing to previously hypothesised pacemaker processes.In the present analyses, we could not find any evidence to support the potential influence of hypothesised pacemakers on our participants' duration reports -whether driven by autonomic processes (revealed by cardiac activity and pupil size) or by dopaminergic phasic activity (indexed by spontaneous blinking).First, our analyses ruled out trivial interpretations for the previously reported association between perceptual change and duration estimation, namely the possibility that time perception was produced by tracking saccadic movements.We found that there was evidence for some relationship between saccadic density and stimulus change in that longer video durations were associated with lower average density -likely reflecting a decrease in saccade frequency as a video progresses, possibly due to waning in novelty.However, average saccade density was greater for office/café scenes (with lowest amount of perceptual change, as computed in Roseboom et al., 2019) than for campus/outside videos, making any naïve novelty-based interpretation less than straightforward.Whatever the reason for the obtained pattern of results, the relationship between perceptual change and saccade density does not follow the same pattern as with duration estimates.In light of this, the observed association between saccade density and greater accuracy in duration estimation (but not estimated duration magnitude, according to the Bayes factor for the correlation) likely reflects greater general engagement, highlighting the caveat that associations between physiological signals and accuracy may indicate a non-specific attentional effect on cognitive performance, without any specific relation to the mechanism(s) of duration estimation.
Consistent with previous findings (Meissner & Wittmann, 2011) we found evidence for an initial change in cardiac rate at the beginning of a new video presentation, consisting of a progressive cardiac deceleration during the first 5-10 seconds of video presentation, compatible with parasympathetic activation (Bradley et al., 2008;Laeng et al., 2012).Cardiac deceleration (reportedly lasting up to the end of the presented intervals, ranging from 8 to 20 seconds) has been described as an indication of an accumulating pattern of autonomic activity working as a clock-type system, consistent with an interoception-based pacemaker-accumulator model (Meissner & Wittmann, 2011); in that study, cardiac deceleration was more pronounced in good performers (i.e., participants showing greater accuracy in duration reproduction).In our case, heart rate deceleration happened only at the beginning of video presentation, although for short videos this could comprise most or all of their duration, stabilizing thereafter: we therefore suppose that it reflects an initial arousal response in relation with a new video/task (Bradley, 2009), rather than a process specifically related to time perception.All of our analyses failed to reveal any statistical association between cardiac activity (heart rate or heart rate progression) and duration estimation or accuracy, with the vast majority indicating evidence for the null hypothesis.Interestingly, when examining putative group differences in cardiac period progression in good versus poor performers, we found no difference in overall progression in both groups (contrary to findings in (Meissner & Wittmann, 2011)), but, on visual inspection, a more pronounced ascending slope was observed in good performers, only for the first 3-5 seconds of the video, in several video durations: see Figures 5E, 5G and 5H.This might indicate a more pronounced arousal response in good performers, related to greater engagement at the beginning of the video presentation, but the fact that the putative difference only affects the very early seconds of the presentation, and does not extend up to the end of the video, does not fit with the proposal of an accumulating pacemaker-type role involving cardiac period.
Regarding the purported association between striatal dopamine and duration perception, our results based on pre-trial blinking, following Terhune and colleagues' analyses (Terhune et al., 2016), were not consistent with the conclusions reported in their study.Nevertheless, several differences in the experiments must be considered.First, regarding the stimuli, our experiment used videos of natural scenes over a broad range of durations, while Terhune and colleagues (Terhune et al., 2016) used basic auditory noise bursts or visual flashes.Another difference was task.Our experiment required participants to directly estimate the observed duration in seconds.Terhune and colleagues (Terhune et al., 2016) used a temporal bisection task wherein participants reported whether the observed duration was closer to long or short anchor durations on which they had been trained.While a large literature has linked striatal dopamine to duration estimation and time perception generally (Allman & Meck, 2012b;Coull et al., 2011;Gu, Jurkowski, Lake, Malapani, & Meck, 2015;Jones, Malone, Dirnberger, Edwards, & Jahanshahi, 2008;Matell & Meck, 2004;Mauk & Buonomano, 2004;Rammsayer, 1999;Wiener, Lee, Lohoff, & Coslett, 2014), it is also well known that striatal dopamine is critically involved in the biasing of decisions in many other dimensions, especially for goal-directed or reward-driven decisions (Friston et al., 2014;Howard, Li, Geddes, & Jin, 2017;Lepora & Gurney, 2012;Lo & Wang, 2006;Nagano-Saito et al., 2012;Sarno, de Lafuente, Romo, & Parga, 2017) -see (Balci, 2014) for an overview of some of these issues as they relate to the time domain.Considering this, the absence of evidence for an effect of pre-trial blinks in our data may not reflect a failure to replicate the previously reported influence of spontaneous blinks on time perception.Instead, it may indicate that our direct duration estimation task is not susceptible to the kind of bias in report that is susceptible to dopaminergic influence: i.e. a shift in decision criterion in a bisection task (Soares et al., 2016;Terhune et al., 2016) or peak-interval procedure.The latter has been commonly used in the rodent literature (Balci, 2014), and presented as evidence of the influence of changes in striatal dopamine on time perception specifically, rather than reflecting uncertainty (Lak, Nomoto, Keramati, Sakagami, & Kepecs, 2017) and biases in decision making (Wang, Rangarajan, Gerfen, & Krauzlis, 2018) more generally.Our results support that further direct replication of the study by Terhune and colleagues (Terhune et al., 2016) with extension to other tasks is required.
Our study, as most studies referenced in this paper (e.g.(Meissner & Wittmann, 2011;Terhune et al., 2016)), involved prospective time judgments, wherein participants know they must estimate duration of an interval in advance of the interval presentation, and therefore direct their attention to ongoing time itself (Block & Gruber, 2014).By contrast, retrospective timing tasks require participants to be naïve to the fact that they will be asked about time until after the interval presentation (Block, Hancock, & Zakay, 2010).Differences in duration estimation under these different paradigms have led to proposals that estimation in different tasks must be, at least partially, underpinned by different processes (Block, 1982;Block et al., 2010;Block & Reed, 1978;Zakay & Block, 1995).As we only tested participants' duration judgements under a prospective paradigm, for now, we can only make conclusions regarding the putative processes related to tasks in that context.Further investigation is required to broadly understand and reconcile the common and different components of prospective and retrospective timing processes.
Several caveats may be considered regarding our study.As mentioned, our experiment contains several differences from previous studies claiming a role for autonomic or neural processes, including the stimuli -naturalistic scenes instead of simple, auditory visual (Meissner & Wittmann, 2011;Terhune et al., 2016) or interoceptive stimuli (Lernia et al., 2018) -and task, using prospective magnitude estimation instead of reproduction, bisection, etc. Participants were asked to watch and ' engage' with the video content, knowing that they would be asked to report its duration in the end: thus, the specific deployment of ' attention to time' would have played some role, relative to some other paradigms.Additionally, the use of pre-trial blinking as a marker of striatal dopamine may be problematic as it is possible that the recorded blinks were not all truly spontaneous -our experiment was not explicitly designed to capture this.Blinks in our task may sometimes have been due to eye fatigue due to the long durations of trials and dynamic visual nature of the trials, or any number of other factors that were not controlled for (though a similar criticism is also largely available for the original study as the precise definition of 'spontaneous' is not clear).However, the complete lack of any statistical association between pre-trial blink and duration estimation suggests that any such modulation had very little effect in the naturalistic conditions of our experiment.
In summary, our results (together with results reported in (Roseboom et al., 2019) for the basic behavioural dataset) strongly support the hypothesis that human subjective time perception (on the scale of seconds) can be characterised primarily as driven by perceptual change.This is not to say that humans never make use of interoceptive information when judging duration, nor that humans rely entirely on vision for temporal perception.Such positions would be unsustainable.Rather, our data show that when provided with dynamic, naturalistic visual input, in the absence of other meaningfully changing input (the videos were silent and participants were seated and stationary), changes in visual content are related to human time estimation while changes in the other measured (cardiac cycle) and implied (striatal dopamine in relation to eyeblinks) factors were not.It is possible that in conditions other than naturalistic wakeful experience (such as simplified laboratory conditions, resting state with closed eyes, exteroceptive sensory deprivation) interoceptive information plays a larger part in multi-modal perceptual content.Nevertheless, under circumstances where humans observe naturalistic video stimuli, there is no evidence for a meaningful contribution of proposed autonomic or neural pacemaker processes to duration estimation.

Figure 1 :
Figure 1: Duration estimation by video duration and type.1A.Average response across participants (in seconds) per video duration.The error bars represent between-participant standard error.Both axes are represented in logarithmic scale for clearer depiction of short durations.Comparing with the reference line of veridical responding, the relationship between stimulus and response has a slope of less than 1, which suggests an effect of regression to the mean.Response dispersion is larger for longer durations, roughly following a power law -in logarithmic scale, the size of the error bars remains approximately constant.1B.Normalized response per video type: city, campus/outside, office/café.Error bars represent 95% credible intervals for a Bayesian ANOVA on the effect of video type on responses.The graph shows that subjective duration is positively associated with perceptual change.1C.Average response across participants (in seconds) per video duration, divided by video type.The error bars represent between-participant standard error.Note that the horizontal axis is not to scale.

Figure 2 :
Figure 2: Relationship between video duration, perceptual change, saccades and duration estimates.2A.Average saccade density across participants, by video duration, split by scene type.Error bars represent the between-participant standard error.Saccade density appears consistently higher for city scenes (for durations over 2 seconds) and decreases with video duration for all scene types.2B.Normalized saccade density by scene type: city, campus/outside, office/café.Error bars represent 95% credible intervals for a Bayesian ANOVA on the effect of scene type on saccade density.As expected, saccade density is highest in city scenes, but its association with perceptual change is not straightforward for the other two scene types.2C.Aggregated responses by video duration, split by saccade density.Error bars represent the average normalized response and between-participant standard error, according to video duration (horizontal axis).Each participant's trials corresponding to each video duration are split into two categories, according to whether saccade density was below or above its average for that participant and video duration.Thus, the plot is split according to within-participant and duration statistics.No overall dependency between saccade density and response can be seen.2D.Scatterplot for trial-by-trial normalized saccade density and response.According to Bayesian Pearson's correlation, there is a weak positive association, although the Bayes factor does not provide support for either the alternative or the null hypothesis.2E.Aggregated error size by video duration, split by dichotomized saccade density.Although not consistent, response error appears smaller in trials with more saccades (particularly clear for 12 and 16-second videos).This possible association was confirmed by a Bayesian correlation showing a negative trend between normalized saccade density and error size (2F).

Figure 3 :
Figure 3: Relationship between mean heart rate (computed trial-wise) and behavioural results.3A-3B.Heart rate and response.3A presents the pattern of responses by video duration, split into trials with below or above-average heart rate compared to other trials of the same participant and duration.3B presents the bivariate correlation between (normalized) mean heart rate and response.3C-3D show analogous results concerning error size: descriptive pattern with dichotomized heart rate (3C) and scatter plot with Pearson's correlation (3D).

Figure 4 :
Figure 4: Average second-by-second cardiac periods for six video durations: 4, 8, 12, 24, 48 and 64 seconds -each video duration is presented in a separate plot.Within a plot, each data point represents the average cardiac period at a specific time point (seconds since video onset).The error bars represent the 95% credible intervals according to a Bayesian RM ANOVA on the effect of time since onset in cardiac periods.An ascending slope can be observed during the first 5-10 seconds in all cases, indicative of a slowing down of heart rate at the beginning of video presentation, although for long durations a plateau is reached later in the trial.

Figure 5 :
Figure 5: Relationship between cardiac period progression and behavioural results.Figures 5A-5D assess the relationship on a trial-by-trial basis, whereas 5E-5H compare between participants.5A-5B: Cardiac period slope and response.5A.Normalized responses by video duration, split into trials with below or above-average cardiac period slope compared to other trials for the same participant and duration.An above-average slope indicates a more pronounced slowing-down of heart rate throughout video presentation, and vice versa.No clear overall pattern is observable.5B.Scatter plot showing the bivariate correlation between trial-wise cardiac period slope and response.5C: Error sizes by video duration, split by dichotomized cardiac period slope.5D: Scatter plot showing the correlation between trial-by-trial cardiac period slope and error size.5E-5H: Second-by-second cardiac periods and 95% credible intervals as a function of time since video onset, for durations of 8, 12, 16 and 24 seconds, split by good and poor performers.Performance classification was made by calculating the average error size (regardless of direction) of each participant and dividing participants according to whether their average error size was above or below the sample median.

Figure 6 :
Figure 6: Average response by video duration according to presence or absence of a pre-trial blink, in the 2000 ms (6A) or 2000-1000 ms (6B) preceding the video onset.Responses (duration estimates) are expressed in seconds.The error bars represent between-participant standard error.The large error bar observed for 2-second videos is due to a single outlier: specifically, a participant who reported '68.2 seconds' as estimated duration of a 2-second video; the participant blinked within the last 1000 ms before the onset of that trial.In the Supplementary Materials (section S4) we present the same plot after exclusion of the outlier.