Intra And Inter Analysis Essay

Citation: Schless S-H, Desloovere K, Aertbeliën E, Molenaers G, Huenaerts C, Bar-On L (2015) The Intra- and Inter-Rater Reliability of an Instrumented Spasticity Assessment in Children with Cerebral Palsy. PLoS ONE 10(7): e0131011.

Editor: Mikhail A. Lebedev, Duke University, UNITED STATES

Received: February 12, 2015; Accepted: May 26, 2015; Published: July 2, 2015

Copyright: © 2015 Schless et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Data Availability: All the raw data that was used in this study are available from Figshare (

Funding: This work was made possible by a grant from the Doctoral Scholarships Committee for International Collaboration with non EER-countries (DBOF) of the KU Leuven, Belgium, awarded to Prof. Kaat Desloovere, grant number DBOF/12/058. This work was also supported by a grant from Applied Biomedical Research from the Flemish Agency for Innovation by Science and Technology, grant number 060799, and funding from the Flemish Research Foundation, FWO: grant 12R4215N. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.


Cerebral Palsy (CP) is the most common neurological disorder in children. It is the result of an upper motor neuron (UMN) lesion in the immature brain. Spasticity is identified in 80–90% of children with CP [1]. Excessive and/or unmanaged spasticity causes pain, limits functional ability, and contributes to secondary complications such as muscle contracture and bone deformity [2]. Despite the detriment of spasticity, there exist only a handful of clinically feasible assessments. Ambiguity over a precise definition of spasticity [3] may be central to this shortcoming.

Perhaps the most commonly cited definition refers to ‘a velocity-dependent increase in tonic stretch reflex with exaggerated tendon jerks, resulting from hyper-excitability’ [4]. Another common citation also incorporates the resistance felt due to an externally imposed movement, increasing with speed of stretch, or above a threshold speed or joint angle [5]. Non-neural related muscle and tendon stiffness also contribute to this resistance, especially in persons with an UMN syndrome [6]. Distinguishing the resistance due to a hyperactive stretch reflex from an increased passive stiffness is clinically very challenging.

In clinical environments, spasticity is routinely measured by means of subjective, easy to use, time-efficient manual clinical scores, grading the level of resistance felt by the assessor during a passive muscle stretch. The Modified Ashworth Scale (MAS) [7] and the Modified Tardieu Scale (MTS) [8] are the most common examples. Despite their frequency of use, both have been criticized for their oversimplification of spasticity evaluation [9]. Several studies have shown that MAS and MTS are incapable of differentiating between neural and non-neural contributions to increased resistance [10]. Furthermore, various studies have highlighted the subjective nature of these assessments, which leads to poor intra- and inter-rater reliability, especially when assessing the muscles of the lower limb, as opposed to the muscles of the upper limb [6,11,12].

This necessitates the need for an objective, quantitative, robust measurement tool, feasible for the clinical environment. It is arguably indispensable for the accurate evaluation of spasticity, and for providing the correct and appropriate course of treatment [10,11].

An instrumented biomechanical approach provides a more quantitative evaluation of resistance when compared to manual clinical scores. For example, motor-driven isokinetic devices displace a limb at a controlled velocity, measuring limb resistance to passive movement [13,14]. Using surface electromyography (sEMG) investigates a muscle’s electrical activity in response to passive or active movements [15,16]. Fewer studies have simultaneously interpreted muscle activity with resistance and velocity measurements. Such an integrated approach is ideal as it considers both the neurophysiological and biomechanical methods [10,11], and assists in differentiating the components of increased resistance. This may help identify why some children respond more positively to spasticity treatment, and ensures that a child with CP receives therapy tailored to the mechanisms contributing to his or her specific symptoms.

However, combining these recommendations requires some compromise. A new method should be more valid and reliable than the current clinical scores, but remain clinically feasible in different patient pathologies and age groups. For example, motor-driven isokinetic devices measure limb resistance to passive movement with high reliability [13,14,17,18], but are often bulky and difficult to apply to children for high-velocity stretches [11]. Furthermore, a stretch reflex may be easier to excite by a transient acceleration, which is robotically more difficult to apply [19]. Therefore, a manually controlled instrumented displacement method offers a more attractive and clinically relevant alternative [20–22]. However, since spasticity is considered to be force- and velocity-dependent, the interaction between patient and examiner may affect the measurement, so a manually controlled displacement method must follow a standardized protocol, and its psychometric properties should be well defined before it is used in clinical practice [11].

Reliability is considered as the basic psychometric criterion for assessment tools. Without it, the consistency of a measurement cannot be evaluated [23], and consequently, the effect of intervention cannot be determined. Some variations arise from methodological errors, and can be considered as indications for improving the quality of the measurement (extrinsic errors), whilst other errors occur naturally, and can only be measured and accounted for (intrinsic errors) [24]. In a spasticity assessment, the variability of sequential stretch repetitions is a measure of the inherent intrinsic error. Preparation of the skin for sEMG placement, participant and limb positioning, time of day and activity prior to measurement are examples of extrinsic errors.

A manually controlled Instrumented Spasticity Assessment (ISA) was recently developed and validated to identify the severity of spasticity in the muscles of children with CP, and distinguish them from the muscle behaviour in typically developing (TD) children [25]. ISA has also been used to evaluate intervention responsiveness to botulinum toxin type-A (BTX) injections in the medial hamstrings [26]. However, until now, a comprehensive reliability study of both the intra- and inter-rater assessments, with an exploration of the influence of various sources of intrinsic and extrinsic error, has yet to be established. The current study aims to evaluate the intra-rater within session, the inter-rater within session, and the intra-rater between session reliability of various performance- and spasticity-related parameters collected with ISA in children with CP. It was hypothesised that a) the parameters assessed with ISA are overall reliable, and b) the data selection procedure does not contribute significantly as a source of extrinsic error.



Twelve participants were recruited from the Clinical Motion Analysis Laboratory, University Hospital of Pellenberg. The inclusion criteria were: (1) diagnosis of spastic CP; (2) 5–18 years of age; and (3) the ability to understand and perform the test procedure. Children were excluded if they had received BTX injections six months prior to the assessment; an intrathecal baclofen pump; selective dorsal rhizotomy; or lower limb orthopaedic surgery. The Ethical Committee of the University Hospitals of Leuven approved the experimental protocol (s50808) and written informed consent for participation was acquired from all parents.

Data acquisition

ISA has previously been reported and described [25]. The device has three components (Fig 1): (1) joint angle characteristics are measured using three inertial measurement units (IMUs: Analog Devices, ADIS16354) at a sample rate of 200 Hz; (2) reactive resistance is measured using a six degrees of freedom force/torque load-cell (ATI mini45: Industrial Automation) at a sample rate of 200 Hz; (3) sEMG activity of agonist and corresponding antagonist muscle is evaluated with a telemetric Zerowire system (Cometa, Milan, IT) at a sample rate of 2000 Hz. Labview (version 8.5, National Instruments) was used for data acquisition.

Fig 1.

A. Measurement instrumentation. (1) three inertial measurement units (joint angle measurement); (2) a six degrees of freedom force/torque-sensor (torque measurement); (3) surface electromyography (muscle activation measurement); B. Measurement set-up for assessing the lateral gastrocnemius. (4) custom ankle orthosis; and (5) support frame. [25].


The four muscles evaluated with ISA were: the lateral belly of the gastrocnemius (LatGas), medial hamstrings (MedHam), rectus femoris (RecFem) and the hip adductors (Adds). These muscles were selected as they are frequently treated for spasticity [8], and are also superficial, which is necessary for acquisition with sEMG. Prior to ISA, all participants underwent a lower limb clinical assessment, including evaluation of passive range of motion (ROM), muscle strength, and muscle selectivity [25]. The MAS and MTS were performed to provide a notion of spasticity. The MAS was performed for all four muscle groups, and in addition, the MTS was performed for the gastrocnemius and hamstrings in cases where a MAS ≥1+ was given. In children with unilateral involvement, the affected side was tested. In children with bilateral involvement, the most affected side (highest average MAS-score, or, in case of symmetrical MAS-scores, the most severe MTS score) was tested. Body-weight, height and length of lower limb segments (full leg, from superior iliac spine to medial malleolus; lower-leg, from the tibia-femoral joint space to the medial malleolus; foot, from lateral malleolus to the head of metatarsal two) were recorded.


Preparation prior to data collection consisted of shaving and cleansing the skin, and application of the sEMG electrodes [25]. One IMU was placed on each segment (thigh, shank, and foot) in positions not interfering with the placement of the sEMG electrodes. IMU placement was arbitrary as calibration trials were carried out during the measurement (S1 Fig [25]). The force/torque loadcell was calibrated and attached to the appropriate limb segment with an orthosis. Measurements of LatGas, MedHam, and RecFem were carried out with the participant in supine lying. Measurement of the Adds was carried out in side lying. For the latter measurement, the force/torque sensor was omitted, as the leg was deemed too heavy to balance on the sensor.


Data collection began with three repetitions of a maximum voluntary isometric contraction (MVIC) for each muscle. IMU calibrations for the ankle, knee and hip were performed, and moment arms were measured with a tape measure. Four repetitions of a manually applied passive muscle stretch at three incremental velocities were performed for each muscle. Low velocity (LV) corresponded to moving the hip, knee or ankle over the available ROM during five seconds, the medium velocity was an intermediate stretch of approximately one second (not included in the current data analysis) and the third, a high velocity (HV) stretch, was performed as fast as possible. The interval between stretch repetitions was seven seconds, to avoid the effects of decreased post activation depression in spastic muscles [27]. This stemmed from the five seconds [28], and 10–15 seconds [29] proposed by other groups in literature. An overview of the measurement protocol per muscle can be found in Fig 2.

Fig 2. Measurement procedure for the four lower limb muscles.

LatGas, lateral gastrocnemius; MedHam, medial hamstrings; RecFem, rectus femoris; Adds, hip adductors. The red arrow indicates the direction of joint movement during stretch. Instrumentation: (1) three inertial measurement units (joint angle measurement); (2) surface electromyography (muscle activation measurement); and (3) a six degrees of freedom force/torque sensor attached to a shank or foot orthosis (torque measurement); (4) support frame. Modified from [30] with permission (S2 Fig).

Research design

Three aspects of reliability were assessed in this study (Fig 3). Sets of stretch repetitions were performed consecutively by two trained raters in a randomised order (coin flipping), which allowed for evaluation of the inter-rater within session (inter-raterWS) reliability. During this analysis, the participant stayed in the evaluation room, and the sensors were not removed. Comparison between the first three good quality stretch repetitions carried out during this session by the first rater provided the data for the evaluation of the intra-rater within session (intra-raterWS) reliability. Upon completion, all sensors were removed and the participant was given a two-hour resting period to allow for washout, during which the participant was in the hospital cafeteria. Following the break, the first rater reapplied all the sensors, and measured the participant for a second time for the evaluation of the intra-rater between session (intra-raterBS) reliability. The consistency of data selection was also evaluated (see data selection section).

Fig 3. Schematic illustrating the three aspects of reliability evaluated within this study.

Inter-raterWS, inter-rater within sessions; Intra-raterWS, intra-rater within sessions; Intra-raterBS, intra-rater between session. The dotted lines indicate the involvement of each rater in their respective analysis.

Data analysis

The data from the acquired LV and HV stretches were processed in MATLAB (version R2013a: MathWorks). The raw sEMG signal was filtered with a 6th order zero-phase Butterworth bandpass filter from 20 to 500 Hz. The root mean square (rms) envelope of the sEMG signal (rms-EMG) was extracted by applying a low-pass 30Hz 6th order zero-phase Butterworth filter on the squared signal. EMG onset was defined on the rms-EMG signal as the time of the first muscle activity according to the method of Staude and Wolf [31]. In cases where this method failed (i.e. no onset or constant activation), a threshold method was applied (onset = rms-EMG activity 2SD >baseline during a 0.05s interval). To estimate joint angles, a Kalman smoother [32] was applied to the data from the IMUs. Muscle lengths were estimated based on the joint angles and anthropometric data using OpenSim software [33]. The torque signals were processed with a low-pass filter with a cut-off frequency of 40Hz [21]. The net internal joint torque was calculated from the segment lengths, moment arms, exerted forces and moments, and the external forces caused by gravity and inertia [34] (see S1 Fig for a detailed overview of the different torque components).

Data selection

For the data acquired from the three analyses, a blinded, independent third rater performed the data selection. In addition, to assess the reliability of the selection procedure, the first rater also selected the data from the inter-raterWS analysis (Fig 3). Data selection was performed by visualising the raw- and processed data signals in MATLAB. Any questionable performance of a stretch repetition annotated during the acquisition was taken into account during data selection.

Reasons for excluding stretch repetitions were due to poor performance or poor quality data. Performance-related reasons for data exclusion included poor handling of the force/torque sensor (mentioned during the acquisition), inconsistent maximum stretch velocities within one trial (for LV, stretch repetitions that were >7°/s from the average of all the repetitions; for HV, stretch repetitions that were >40°/s from the average of all the repetitions, derived from previously collected data [26]), or stretches that were performed outside the desired plane of motion (forces and torques registered in directions other than the sagittal plane). Poor quality EMG included clear artefacts in the EMG signal, loss of the EMG signal, a highly inconsistent EMG pattern in comparison with the other stretch repetitions, low signal-to-noise ratio or active assistance of the participant during the passive stretches (activation of agonist and/or antagonist prior to stretch onset or at inconsistent moments during stretch). The automatic definition of EMG onset was visually inspected. In those cases when neither automatic EMG onset detection method was successful, the third rater manually determined the EMG onset based on visual inspection.

Outcome parameters

Twelve parameters based on previous ISA literature [24,34,35] were selected and categorised as either performance-related (five parameters) or spasticity-related (seven parameters).


Performance-related parameters were used to evaluate the quality of the performance of the stretch repetitions. They included the ROM covered during LV and HV stretches (ROMLV and ROMHV, respectively). The maximum velocity reached during LV and HV stretches (VMAXLV and VMAXHV, respectively), and the single largest value of the rms-EMG amplitude acquired from the three MVIC repetitions (peak MVIC).


Spasticity-related parameters were extracted from rms-EMG and from the computed net internal joint torque. A ‘zone of maximum velocity’ (Vmaxzone) was demarcated in order to emphasise the velocity-dependent character of spasticity. The Vmaxzone was defined as starting 200ms prior to VMAX and ending at 90% of the full ROM of the stretch. Average rms-EMG was calculated by dividing the area under the rms-EMG time curve by the duration of the Vmaxzone (rms-EMG, expressed in mV). This parameter was also expressed as a normalised percentage to the peak MVIC (rms-EMG, expressed as %). Torque (expressed in Nm) was analysed at 70° knee flexion for the MedHam and RecFem, and at 10° plantar flexion for the LatGas. These angles corresponded to a common mid-ROM angle amongst all participants. Work (expressed in J) was defined as the integral of torque with respect to the position between VMAX and 90% of the ROM. The muscle-lengthening threshold was defined as the muscle length at the time of EMG onset during a LV stretch. EMG onset during LV stretches were not often present in the LatGas and RecFem [25]. Therefore, this parameter was only calculated for the MedHam and Adds. In all four muscles, muscle-lengthening velocity threshold was defined as the muscle-lengthening velocity at the time of EMG onset during a HV stretch. All muscle lengths and muscle lengthening velocity thresholds were expressed as a percentage of the muscle length in the anatomical zero position (ML and MLV, expressed as % and %/s, respectively). The angle of catch (AOC) was defined as the angle that corresponded to the time of the first local minimum power after the time that maximum power was reached [36], and was expressed as a percentage of the ROM. To provide a measure of the severity of spasticity, the absolute change between the average of 3–4 repetitions from HV and LV stretch repetitions (HV-LV) were calculated for rms-EMG, Torque and Work.

For the intra-raterWS analysis, only ROM, VMAX, ML and MLV were calculated. For the inter-raterWS and intra-raterBS analyses, ROM, VMAX, rms-EMGHV-LV, TorqueHV-LV, WorkHV-LV, ML and MLV were calculated by taking the average of 3–4 good stretch repetitions per velocity. AOC was calculated from the first well performed HV stretch, and its reliability was only evaluated for the inter-raterWS and intra-raterBS analyses. The reliability of MVIC was only evaluated for the intra-raterBS analysis.

Statistical analysis

Group descriptive statistics of all parameters were calculated per muscle and measurement session. Bland-Altman plots portraying limits of agreement were created and independently reviewed by two raters to determine any systematic bias. Relative and absolute reliability were evaluated using the intra-class correlation coefficients (ICC 2,1 for intra-raterWS and ICC 2,k for inter-raterWS and intra-raterBS) with 95% confidence intervals [37] and the standard error of measurement (SEM), respectively. The reliability of the data selection procedure was determined by calculating the ICC (ICC 2,k) and SEM on the data curated by raters one and three. The ICC was investigated for absolute agreement to detect any relevant systematic error between raters. The SEM was calculated from the square root of the mean square error from one-way ANOVA, and expressed as a percentage of the mean of the test and re-test values [23]. SEM% values <20% were considered acceptable based upon the average change in previously reported ISA parameters following treatment with BTX in the MedHam [25,26]. ICCs >0.80 indicated high relative reliability, 0.60–0.79 indicated moderately-high relative reliability, 0.40–0.59 indicated moderate relative reliability and <0.40 indicated low relative reliability [38]. To identify the most responsive spasticity-related parameters, the minimal detectable change (MDC) was calculated (MDC = SEM x 1.645 x √2) [39], and expressed as a percentage of the mean of the test and re-test values. Statistical analysis was performed using MATLAB 7.6.0 R2013a (MathWorks), SPSS Statistics (version 22 IBM), and MedCalc (version 12.7).


Twelve children participated in the study (Table 1). One child participated only in the inter-raterWS analysis, and two children participated only in the intra-raterWS&BS analysis. This yielded a total of 11 children for the intra-raterWS&BS analyses, and 10 children for the inter-raterWS analysis. Data of two RecFem and one Adds were excluded due to time restrictions at the time of data collection, or due to poor quality EMG. The ML parameter was not calculated for two MedHam and five Adds in the intra-raterWS&BS analyses, and for one MedHam and four Adds in the inter-raterWS analysis, due to a lack of EMG onset at LV. Similarly, due to a lack of EMG onset at HV, the MLV parameter was not calculated for two MedHam and two Adds in the intra-raterWS&BS analyses, and for one MedHam and one Adds in the inter-raterWS analysis.

Data selection

Following the selection of the 1249 stretch repetitions from the inter-raterWS and intra-raterBS analyses, 139 (11%) were excluded. From the session curated by raters one and three (total 570 stretch repetitions), rater one excluded 131 stretch repetitions (23%) and rater three excluded 76 stretch repetitions (13%). Table 2 reports the subsequent ICC and SEM% values of the data curated by the two raters. Of all the 39 ICC values, two (MLV in the LatGas and AOC in the RecFem) were <0.6. The ICC of the ML for the Adds was not computable. This happens when the between-subject variation is relatively small compared to the within-subject variation.

SEM% values <20% were found in all but one of the 16 performance-related parameters, the exception being VmaxLV for the Adds. For the spasticity-related parameters, SEM% values <20% were found in all but five of the 23 parameters (MLV in the LatGas and Adds, Torque of MedHam, and rms-EMG and rms-EMG % of the Adds).

The intra-raterWS, inter-raterWS, and intra-raterBS analyses

Results from the reliability analyses for the LatGas and MedHam can be found in Table 3, and those for the RecFem and Adds in Table 4. Parameters computed using HV-LV, tended to have higher SD values. This was especially the case for the rms-EMGHV-LV parameters. There was no evidence of systematic bias or heteroscedasticity.

Of all the ICC values, 76% were >0.8 and 14% >0.6 (Table 5). Of the 11 ICC values <0.6, four were in the intra-raterBS analysis, and seven in the inter-raterWS analysis. There were three VmaxLV; two VmaxHV; two rms-EMGHV-LV (%); one ROMLV; one TorqueHV-LV; one AOC and one MLV. Four were found in the LatGas, three in the MedHam, and two in both the RecFem and Adds.

Table 5. The number of parameters in all three analyses categorised according to their intra-class correlation coefficient (ICC) and standard error of measurement (SEM) and expressed as a percentage of the mean test and re-test values for all four muscles.

ICC values with their corresponding confidence intervals for inter-raterWS and intra-raterBS are displayed in Fig 4. In the LatGas and MedHam, overall wider CIs of the ICC values were seen for the inter-raterWS than for the intra-raterBS, except for the rms-EMGHV-LV (%), which was wide in both analyses. With the exception of VmaxLV and AOC, the opposite trend was seen for the RecFem. CIs of both Adds analyses were similar, but generally wider than those in the other muscles.

Fig 4. The intra-class correlation coefficients (ICC) and confidence intervals (CI) for intra-raterBS and inter-raterWS analyses.

LatGas, lateral gastrocnemius; MedHam, medial hamstrings; RecFem, rectus femoris; Adds, hip adductors; LV, Low Velocity; HV, High Velocity; HV-LV, Difference between HV and LV; VMAX, Maximum angular velocity; ROM, Range of Motion; MVIC, Maximum Voluntary Isometric Contraction; rms-EMG, root mean squared electromyography; AOC, Angle of Catch; ML, Muscle Length; MLV, Muscle Lengthening Velocity. The red vertical line indicates an ICC of 0.6, above which relative reliability is considered to be at least moderately high. A = an ICC that could not be calculated.

Standard error of measurement (SEM)

For the SEM values of all four muscles, expressed as a percentage of the average of the mean of the test and re-test values, 37% were below 10% error, 33% were between 11–20% error, 17% were between 21–30% error and 13% were ≥30% error (Table 5). Of those 32 SEM values >20%, 17 were found in the intra-raterBS analysis, 14 were found in the inter-raterWS analysis and one in the intra-raterWS analysis. The higher SEM values were seven rms-EMGHV-LV (%); five rms-EMGHV-LV (mV); four VmaxLV; four WorkHV-LV; four MVIC; four MLV; three TorqueHV-LV; and one ROMLV, and were more often found in the RecFem and Adds than in the LatGas and MedHam.


This study evaluated the reliability of an instrumented assessment tool integrating multidimensional signals in order to quantify spasticity in children with spastic CP. The different sources of intrinsic and extrinsic errors associated with ISA were comprehensively analysed in this study. ISA was found to be reliable in all of the three reliability analyses, with 90% of the parameters showing ICC values >0.6, and 70% of the SEM% values <20%. In most cases, ICC values >0.6 were accompanied by SEM% values <20%. This confirmed our first hypothesis that parameters investigated with ISA are overall reliable.


Intra-raterWS analysis.

The intra-raterWS analysis compared the first three good quality stretch repetitions in the same measurement session. This assessed for any error inherent to the investigated parameters. Such error may be caused by intrinsic factors such as spasticity, post activation depression, thixotropy, or an extrinsic error like the waiting time between stretch repetitions. In this analysis, most parameters showed an ICC >0.8 and SEM% values <20%. SEM% values were comparable to, if not smaller than the values from the two other reliability analyses. This finding confirms a limited contribution of error due to three repeated stretch repetitions, and infers that a seven second waiting period is satisfactory, allowing for the influence of any hyper-excitability or post activation depression of a muscle stretch to subside [25].

Intra-raterBS analysis.

After the intra-raterWS analysis, the second most reliable analysis was the intra-raterBS, where extrinsic errors introduced between sessions were analysed. Re-application of the IMU sensors in different sessions requires a new calibration procedure, possibly influencing the joint motion parameters. A similar justification can also be made for the re-application of the sEMG electrodes and orthoses, which may influence the spasticity-related parameters and the handling of a stretch. Additionally, the participant and the limb on the support frame need to be repositioned. Nonetheless, the intra-raterBS analysis still demonstrated a satisfactory level of reliability. In order to further improve a between session analysis, the sources of extrinsic error should be accounted for and reduced. Bar-On et al. have previously evaluated the reliability for the intra-raterWS&BS analyses for several parameters of the LatGas and MedHam [25]. In comparison with the current study, they showed lower ICC and generally higher SEM values for all performance- and some spasticity-related parameters. This finding was expected as their study included only six participants, which may not have been a representative sample. Furthermore, in contrast to the two-hour interval between measurement sessions of the current study, Bar-On et al. reported an average interval of 13 days [25]. Too short an interval may interfere with the participants’ concentration, whilst too long an interval makes it challenging to control what happens during the interim period. The appropriate time interval for a between session reliability analysis should be further investigated.

Inter-raterWS analysis.

The reliability of ISA was generally higher when comparing within and between sessions performed by the same rater, than between two different raters. Inter-rater reliability is significant if ISA is to be used in clinical practice, as the same rater is not always available to perform a follow up assessment. Furthermore, considering that the current inter-rater analysis investigated within the same session, additional extrinsic errors are also anticipated between sessions. Standardisation and training should be further improved to increase the reliability when different raters perform the measurement. This could be achieved by ensuring that different raters practice together when learning how to grasp the loadcell, where to stand when performing each measurement, the addition of a metronome beep to suggest and support specific stretch velocities, and by the use of training videos.

Investigated muscles.

When comparing the four muscles, the performance-related parameters had a tendency to be most reliable in the MedHam, followed by LatGas and RecFem, and then Adds. For the spasticity-related parameters, the RecFem had the highest reliability, followed by MedHam and LatGas, and then Adds. It is not so surprising that the Adds were the least reliable of the investigated muscles, as they are also the most difficult stretch to perform. It requires movement of the entire limb, as opposed to just a single segment, which may allow a larger introduction of errors. Furthermore, identifying only one of the adductor muscles is challenging in children with CP, and crosstalk between muscles may have occurred. Additionally, the nature of spasticity in the Adds may have a higher intrinsic error than the other three muscles. This could not be confirmed by the current study, as indications of spasticity severity (HV-LV) were not computable in the intra-raterWS analysis, and comparisons between different muscles with spasticity have not been reported in literature.

The implications of data selection

Since ISA is a manually performed test, the selection procedure is essential in ensuring that only well performed stretch repetitions are included for analysis. However, as the selection procedure was not automated, it has to be considered as a possible source of extrinsic error. Two raters independently curated the same set of data, following the same rules of data exclusion. The final number of included stretch repetitions varied between the two raters (excluded: rater one = 23%; rater three = 13%). Despite these differences, small SEM% values were found in all but five of the 23 spasticity-related parameters. The exception was the MLV parameter in the LatGas and Adds. This parameter was calculated by defining the timing of EMG onset. In those cases when neither automatic EMG onset detection method was successful, the EMG onset was manually determined, which may explain some of the discrepancy between raters. Another exception was the Torque parameter of MedHam. Stretch repetitions were seldom excluded due to artefacts in the torque signal. Therefore, exclusion of stretch repetitions based on other criteria was the likely cause of a high SEM% for the torque parameter. Lastly, low selection agreement between raters also influenced the two rms-EMG parameters of the Adds. This may have been caused by the high EMG baseline often seen in the Adds. Overall though, the investigation of the data selection procedure confirmed the hypothesis that little extrinsic error is introduced, as long as three well-performed stretch repetitions are available, and that both raters adhere to the well-defined selection criteria. In the future, the addition of a live feedback system informing the clinician in real time about each stretch repetition, will avoid the issue of capturing excess data to provide at least three well-performed stretch repetitions.

ISA compared to other literature

To the best of the author’s knowledge, only six other groups evaluated the reliability of a manually controlled device that combines multidimensional signals for the assessment of spasticity (Table 6).

Overall, the parameters that could be compared to previous studies were shown to be of either similar, or higher reliability in ISA. Although all the studies in Table 6 assessed spasticity with multidimensional signals, only two studies investigated the reliability of both the biomechanical and electrophysiological parameters, and that was in the pathology of stroke [41,42]. Furthermore, no study assessed the reliability of a manually controlled device in CP. For the studies that assessed an intra-raterWS analysis, waiting time between stretch repetitions varied from one second to 15 seconds, suggesting that the seven second time interval selected for ISA is a fair compromise. Between sessions analyses intervals ranged from 10 minutes, to one day, illustrating the obscurity of what is sufficient. Finally, the extent of statistical analyses for assessing reliability varied between studies, and it can be viewed as a limitation that only one study investigated a measure of absolute reliability.

Implications of findings

Reliability is considered to be the basic psychometric criterion for assessment tools, and without it, validity and responsiveness cannot be determined. The SEM infers that the smaller its value, the fewer the errors (random and systematic), and in turn the greater the reliability [43]. An SEM% value may also be referenced in terms of the responsiveness to treatment. If an SEM value is able to yield an MDC value small enough to detect change post treatment, it can be statistically interpreted as reliable. Based on the results of the current study, we can attempt to assess the clinical feasibility of ISA in its current state. As previously identified, all four investigated muscles had EMG onsets at high velocity, suggesting some component of velocity-dependent spasticity. In addition, the MedHam and Adds also had an EMG onset at low velocity, suggesting a component of position-dependent spasticity. This already suggests a possible distinction for evaluating various types of spastic behaviour. Certain ISA parameters have been deemed sensitive enough to differentiate between pre and post treatment intervention with BTX in the MedHam [26]. In order to validate this finding, the corresponding MDC values of the same spasticity-related parameters from the current study can be compared to the average treatment induced change values reported in literature (Table 7).

Table 7. MDC for the spasticity-related parameters of the medial hamstrings (MedHam), and the average difference of those parameters between pre and post treatment with Botulinum Toxin-A (BTX) as previously reported [26].

The MDC value of the rms-EMGHV-LV (mV) parameter was small enough to detect a response in the MedHam to treatment with BTX. This is expected because the rms-EMG parameter most closely reflects the definition of spasticity [4]. However, the effect of BTX treatment on the MedHam did not exceed the reported MDC values for the torque and work parameters. These parameters not only reflect spasticity, but also non-neural tissue changes such as increased passive muscle stiffness and viscosity. These non-neural components could account for the parameters’ limited response in detecting a change post BTX [44]. Another consideration is that these parameters are highly dependent on the way the stretch is performed (grasp of the force/torque load-cell). Further research is required to study the effect of tone reduction treatment for all lower limb muscles, using the MDC values of the spasticity related parameters reported by the current study. Additionally, progress is also required to decompose the biomechanical parameters into their neural and non-neural components.

For a device like ISA, the MDC alone is not enough, and it is also important to acknowledge the minimally important change (MIC). The MIC can be established by evaluating the effect of decreasing spasticity on the development of secondary muscle deformities. On a future consideration, changes in function by means of 3D motion analysis, and patient/clinician feedback can also be used.

Study limitations

Several study limitations need to be acknowledged. The number of participants was small, especially for a reliability study applying parametric statistics. Twelve participants are comparable to the sizes recruited in other studies [21,28,29,40–42], but are still limited taking into account the power analysis estimated by Walter et al [45]. The medium velocity stretch repetitions were excluded from this investigation, as manually acquiring them with ISA is more challenging and time consuming than with a motorized system. In those cases where a low ICC value was combined with a relatively low SEM% value, it can be argued that the ICC may not have been a suitable statistic. The ICC is indicative of relative reliability, so if the sample group is homogenous, ICC values will be small, even if the test-retest variability is small, and vice versa [23]. This limitation necessitated the inclusion of a measure of absolute reliability. If an SEM is high, consideration of the various sources of error can help to determine if it can be reduced [24]. In the case of a high ICC value with a high SEM, this may indicate systematic error. One way to estimate the presence of systematic error over random error is to compare various ICC calculation models [23].

Parameters involving HV-LV calculations often showed poorer reliability. As these parameters were not assessed in the intra-raterWS analysis, further investigation is required to determine where the error is coming from, and if it can be reduced. The MVIC may be difficult to collect in children with CP [46], therefore, it was decided that both normalised and non-normalised rms-EMG parameters would be investigated. Overall, the non-normalised rms-EMG parameter appeared to be more reliable, indicating that the MVIC introduced error. This should be considered in future studies when attempting to detect severity of spasticity or responsiveness to an intervention.

For reasons of feasibility, this study was unable to evaluate the reliability of an inter-raterBS analysis. Based on the findings of the intra-raterBS and inter-raterWS analyses, it is assumed that there will be some degree of error within the parameters of an inter-raterBS analysis. Consequently, without this analysis, if two different raters perform the pre and post measurements of an intervention, it is unknown if the investigated parameters will be sensitive enough to detect a change. This gap remains a limitation in ascertaining the true reliability of ISA in the clinical setting.

As angles were only calculated in the sagittal plane, it was assumed that calibration and stretch trials were only performed within this plane, and in addition, that only one joint was moved during stretch. A previous study reported limited measurement error when small out-of-plane-movements, or movement of the proximal joint occur [25]. Nevertheless, in the current study, participants lacking neutral joint-alignment were excluded, and out-of-plane movements were minimized by means of standardised reporting on the performance of each stretch.

Lastly, inertial influences on torque were estimated with anthropometric approximations, whereby the foot and lower leg were considered as one segment (see appendix 1) [34]. Fortunately, a previous study has shown that the error introduced by assuming the ankle as fixed during knee movements only has a limited effect on the resulting knee-joint torque [25].


Based on the outcomes of this reliability study, together with the previously published literature, ISA has been demonstrated to possess a wide range of applications in both the research and clinical environment. The sources of error identified within this study seem to be small, and to not have a large impact on the parameters. The intra-raterWS was the most reliable of the three analyses, followed by the intra-raterBS, and then the inter-raterWS. The time interval between sessions, re-application of sensors and repositioning of the participant are likely sources of error. When two different raters perform the measurement, standardisation and training should be improved to minimise the extrinsic error as much as possible. Errors were also muscle specific, or related to the measurement set-up. This variation needs to be accounted for, especially when assessing pre-post interventions or longitudinal follow-up.

Citation data is made available by participants in Crossref's Cited-by Linking service. For a more comprehensive list of citations to this article, users are encouraged to perform a search inSciFinder.

  • Missing Value Monitoring Enhances the Robustness in Proteomics Quantitation


    Journal of Proteome Research201716 (4), 1719-1727

    Abstract | Full Text HTML | PDF | PDF w/ Links

  • Longitudinal Urinary Protein Variability in Participants of the Space Flight Simulation Program

    Nina A.KhristenkoIrina M.LarinaBrunoDomon

    Journal of Proteome Research201615 (1), 114-124

    Abstract | Full Text HTML | PDF | PDF w/ Links

  • Advancing Urinary Protein Biomarker Discovery by Data-Independent Acquisition on a Quadrupole-Orbitrap Mass Spectrometer

    JanMuntelYueXuanSebastian T.BergerLukasReiterRichardBachurAlexKentsisHannoSteen

    Journal of Proteome Research201514 (11), 4752-4762

    Abstract | Full Text HTML | PDF | PDF w/ Links

  • Identification of Glycoproteins Containing Specific Glycans Using a Lectin-Chemical Method

    YanLiPunitShahAngelo M.De MarzoJennifer E.Van EykQianqianLiDaniel W.ChanHuiZhang

    Analytical Chemistry201587 (9), 4683-4687

    Abstract | Full Text HTML | PDF | PDF w/ Links

  • A New Workflow for Proteomic Analysis of Urinary Exosomes and Assessment in Cystinuria Patients

    MatthieuBourderiouxThaoNguyen-KhoaCerinaChhuonLudovicJeansonDanielleTondelierMartaWalczakMarioOlleroSoumeyaBekriBertrandKnebelmannEstelleEscudierBernardEscudierAleksanderEdelmanIda ChiaraGuerrera

    Journal of Proteome Research201514 (1), 567-577

    Abstract | Full Text HTML | PDF | PDF w/ Links

  • Urine Sample Preparation in 96-Well Filter Plates for Quantitative Clinical Proteomics

    YanbaoYuMoo-JinSuhPatriciaSikorskiKeehwanKwonKaren E.NelsonRembertPieper

    Analytical Chemistry201486 (11), 5470-5477

    Abstract | Full Text HTML | PDF | PDF w/ Links

  • Improved Intensity-Based Label-Free Quantification via Proximity-Based Intensity Normalization (PIN)

    Susan K.Van RiperEbbing JongLeeAnnHigginsJohn V.CarlisTimothy J.Griffin

  • One thought on “Intra And Inter Analysis Essay

    Leave a Reply

    Your email address will not be published. Required fields are marked *