Logo + Home
 
Deutsche Version  
   BQS Online    BQS Qualitätsreport    BQS Outcome    BQS Qualitätsindikatoren Datenbank   

Reliability

An important prerequisite for reliability is clear and unambiguous definitions. The two criteria clarity of definitions and reliability thus show interactions.

Definition
The measurement is reproducible with a defined measuring method (data collection and analysis).

Two procedures are usually distinguished:

  • Test-retest procedure: the measuring procedure is performed twice on the same objects. The agreement of both result series quantifies the test-retest reliability.
  • Inter-rater procedure: the measuring procedure is performed by different evaluators in independent measurements on the same objects. The agreement of both result series, quantifies the interrater reliability.

Core Statement
The following statement is assessed: “The measurement is reliable”.

Information Base for the Assessment
The direct determination of the reliability of a quality indicator can take place as a test-retest reliability or an interrater reliability procedure. In each case, at least two measurements per measured object have to be present. The data obtained from clinical care serves as an information base for this criterion. In general, patient records are utilized. If available, results from data validation sources obtained by repeated secondary data entry may be used. During this data validation procedure, it is examined to what degree the original patient record corresponds to the submitted data record.

For the methodological assessment of quality indicators, two measurements per measured object are not always available. As a result, the reliability of the quality indicators may not be determined with the known methods. Therefore, an additional procedure was developed for QUALIFY which allows an estimation of the reliability on the basis of available data without multiple measurements. The determination of the reliability of the quality indicators with this method is done through an analysis of the variability of the outcomes of individual hospitals between consecutive time intervals. The indicator is evaluated as “reliably measurable” when its expression in consecutive quarters does not show statistically significant differences (please note: the reverse conclusion does not apply because a change in the expression can also be due to a change in the quality characteristic measured).

Specifically, the quality indicator rates per quarter are calculated separately for each hospital over two years, if possible. Then, the differences between the rates in subsequent quarters are tested for significance for each hospital. Afterwards, 75% confidence intervals are calculated and tested for overlapping. This corresponds in each case to an alpha error level of 25% which is selected in this height because in this case it is about confirming the null hypothesis (read here: there are no significant differences between the samples under comparison). If for one quality indicator, for example data from 1,000 hospitals from eight quarters are available, then based on this method a total of 7,000 results of tests of significance can be calculated. The basic idea of this approach is that the fewer quarter comparisons that turn out to be significant, the more likely that this can be seen as an indication of the reliability of the quality indicator under consideration.

Assessment Process
If results from a source data validation are available, the assessment process takes place as follows:

If in the data comparison of the results of the quality indicators, no difference or only a few differences are seen between the data of BQS documentation and the data from a second collection (in source data validation), this will be presumed to be a reliable capture of the quality indicators. However, the higher the proportion of differences is, the lower the reliability has to be assessed. BQS suggests the following gradation: Quality indicators measure reliably when the differences that can be detected are not more than 5% of comparisons. If the proportion is above 5% and up to 10% then the quality indicator measures “rather reliably”. If the proportion is above 10% and up to 20% it is considered “rather not reliable”. If the proportion of differences exceeds 20% then it is a quality indicator that is “not reliable”.

In the context of the substitute procedure the assessment process takes place as follows:

On the basis of the results of the quarter comparisons, the following assessment stages occur: If the proportion of significant quarter comparisons is above 10% then the quality indicator is “not reliable”. If the proportion is between 5% and 10% the quality indicator can be considered as “rather reliable”. A quality indicator on the contrary will be considered “reliable” if 5% or less of all quarter comparisons is significant. For each individual quality indicator this reliability assessment was submitted to the evaluators as a suggestion for use. In addition, for this assessment procedure the data fields and their definitions which are necessary for the measurement of the quality indicators were listed. It is pointed out explicitly that with this proceeding, the expertise of the experts should be especially considered.


After all evaluators have acknowledged and understood the information base, they assess the core statement. The process is described in detail in Appendix 1.


Assessment Stages
1 = does not apply
2 = rather does not apply
3 = rather applies
4 = applies
Abstention

Comments
The substitution method developed by BQS for the estimation of reliability requires an indicator-specific interpretation. Changes in indicator expressions during observation periods can also be explained by improvements or declines of quality aspects. Because the analysis is done on the basis of individual institutions, one can presume that this effect is only shown in an individual institution and will then be disproved by the majority of the other institutions. Final statements about the value of this method cannot be made at the present time because this method would need to be compared with the gold standard of the direct testing method.