# Cox (Proportional Hazards) Regression

This function fits Cox's proportional hazards model for survival-time (time-to-event) outcomes on one or more predictors.

Cox regression (or proportional hazards regression) is method for investigating the effect of several variables upon the time a specified event takes to happen. In the context of an outcome such as death this is known as Cox regression for survival analysis. The method does not assume any particular "survival model" but it is not truly nonparametric because it does assume that the effects of the predictor variables upon survival are constant over time and are additive in one scale. You should not use Cox regression without the guidance of a Statistician.

Provided that the assumptions of Cox regression are met, this function will provide better estimates of survival probabilities and cumulative hazard than those provided by the Kaplan-Meier function.

Hazard and hazard-ratios

Cumulative hazard at a time t is the risk of dying between time 0 and time t, and the survivor function at time t is the probability of surviving to time t (see also Kaplan-Meier estimates).

The coefficients in a Cox regression relate to hazard; a positive coefficient indicates a worse prognosis and a negative coefficient indicates a protective effect of the variable with which it is associated.

The hazards ratio associated with a predictor variable is given by the exponent of its coefficient; this is given with a confidence interval under the "coefficient details" option in StatsDirect. The hazards ratio may also be thought of as the relative death rate, see Armitage and Berry (1994). The interpretation of the hazards ratio depends upon the measurement scale of the predictor variable in question, see Sahai and Kurshid (1996) for further information on relative risk of hazards.

Time-dependent and fixed covariates

In prospective studies, when individuals are followed over time, the values of covariates may change with time. Covariates can thus be divided into fixed and time-dependent. A covariate is time dependent if the difference between its values for two different subjects changes with time; e.g. serum cholesterol. A covariate is fixed if its values can not change with time, e.g. sex or race. Lifestyle factors and physiological measurements such as blood pressure are usually time-dependent. Cumulative exposures such as smoking are also time-dependent but are often forced into an imprecise dichotomy, i.e. "exposed" vs. "not-exposed" instead of the more meaningful "time of exposure". There are no hard and fast rules about the handling of time dependent covariates. If you are considering using Cox regression you should seek the help of a Statistician, preferably at the design stage of the investigation.

Model analysis and deviance

A test of the overall statistical significance of the model is given under the "model analysis" option. Here the likelihood chi-square statistic is calculated by comparing the deviance (- 2 * log likelihood) of your model, with all of the covariates you have specified, against the model with all covariates dropped. The individual contribution of covariates to the model can be assessed from the significance test given with each coefficient in the main output; this assumes a reasonably large sample size.

Deviance is minus twice the log of the likelihood ratio for models fitted by maximum likelihood (Hosmer and Lemeshow, 1989 and 1999; Cox and Snell, 1989; Pregibon, 1981). The value of adding a parameter to a Cox model is tested by subtracting the deviance of the model with the new parameter from the deviance of the model without the new parameter, the difference is then tested against a chi-square distribution with degrees of freedom equal to the difference between the degrees of freedom of the old and new models. The model analysis option tests the model you specify against a model with only one parameter, the intercept; this tests the combined value of the specified predictors/covariates in the model.

Some statistical packages offer stepwise Cox regression that performs systematic tests for different combinations of predictors/covariates. Automatic model building procedures such as these can be misleading as they do not consider the real-world importance of each predictor, for this reason StatsDirect does not include stepwise selection.

Survival and cumulative hazard rates

The survival/survivorship function and the cumulative hazard function (as discussed under Kaplan-Meier) are calculated relative to the baseline (lowest value of covariates) at each time point. Cox regression provides a better estimate of these functions than the Kaplan-Meier method when the assumptions of the Cox model are met and the fit of the model is strong.

You are given the option to 'centre continuous covariates' – this makes survival and hazard functions relative to the mean of continuous variables rather than relative to the minimum, which is usually the most meaningful comparison.

If you have binary/dichotomous predictors in your model you are given the option to calculate survival and cumulative hazards for each variable separately.

Data preparation

• Time-to-event, e.g. time a subject in a trial survived.
• Event / censor code - this must be ≥1 (event(s) happened) or 0 (no event at the end of the study, i.e. "right censored").
• Strata - e.g. centre code for a multi-centre trial. Be careful with your choice of strata; seek the advice of a Statistician.
• Predictors - these are also referred to as covariates, which can be a number of variables that are thought to be related to the event under study. If a predictor is a classifier variable with more than two classes (i.e. ordinal or nominal) then you must first use the dummy variable function to convert it to a series of binary classes.

Technical validation

StatsDirect optimises the log likelihood associated with a Cox regression model until the change in log likelihood with iterations is less than the accuracy that you specify in the dialog box that is displayed just before the calculation takes place (Lawless, 1982; Kalbfleisch and Prentice, 1980; Harris, 1991; Cox and Oakes, 1984; Le, 1997; Hosmer and Lemeshow, 1999).

The calculation options dialog box sets a value (default is 10000) for "SPLITTING RATIO"; this is the ratio in proportionality constant at a time t above which StatsDirect will split your data into more strata and calculate an extended likelihood solution, see Bryson and Johnson, (1981).

Ties are handled by Breslow's approximation (Breslow, 1974).

Cox-Snell residuals are calculated as specified by Cox and Oakes (1984). Cox-Snell, Martingale and deviance residuals are calculated as specified by Collett (1994).

Baseline survival and cumulative hazard rates are calculated at each time. Maximum likelihood methods are used, which are iterative when there is more than one death/event at an observed time (Kalbfleisch and Prentice, 1973). Other software may use the less precise Breslow estimates for these functions.

Example

Test workbook (Survival worksheet: Stage Group, Time, Censor).

The following data represent the survival in days since entry to the trial of patients with diffuse histiocytic lymphoma. Two different groups of patients, those with stage III and those with stage IV disease, are compared.

Stage 3: 6, 19, 32, 42, 42, 43*, 94, 126*, 169*, 207, 211*, 227*, 253, 255*, 270*, 310*, 316*, 335*, 346*

Stage 4: 4, 6, 10, 11, 11, 11, 13, 17, 20, 20, 21, 22, 24, 24, 29, 30, 30, 31, 33, 34, 35, 39, 40, 41*, 43*, 45, 46, 50, 56, 61*, 61*, 63, 68, 82, 85, 88, 89, 90, 93, 104, 110, 134, 137, 160*, 169, 171, 173, 175, 184, 201, 222, 235*, 247*, 260*, 284*, 290*, 291*, 302*, 304*, 341*, 345*

* = censored data (patient still alive or died from an unrelated cause)

To analyse these data in StatsDirect you must first prepare them in three workbook columns as shown below:

 Stage group Time Censor 1 6 1 1 19 1 1 32 1 1 42 1 1 42 1 1 43 0 1 94 1 1 126 0 1 169 0 1 207 1 1 211 0 1 227 0 1 253 1 1 255 0 1 270 0 1 310 0 1 316 0 1 335 0 1 346 0 2 4 1 2 6 1 2 10 1 2 11 1 2 11 1 2 11 1 2 13 1 2 17 1 2 20 1 2 20 1 2 21 1 2 22 1 2 24 1 2 24 1 2 29 1 2 30 1 2 30 1 2 31 1 2 33 1 2 34 1 2 35 1 2 39 1 2 40 1 2 41 0 2 43 0 2 45 1 2 46 1 2 50 1 2 56 1 2 61 0 2 61 0 2 63 1 2 68 1 2 82 1 2 85 1 2 88 1 2 89 1 2 90 1 2 93 1 2 104 1 2 110 1 2 134 1 2 137 1 2 160 0 2 169 1 2 171 1 2 173 1 2 175 1 2 184 1 2 201 1 2 222 1 2 235 0 2 247 0 2 260 0 2 284 0 2 290 0 2 291 0 2 302 0 2 304 0 2 341 0 2 345 0

Alternatively, open the test workbook using the file open function of the file menu. Then select Cox regression from the survival analysis section of the analysis menu. Select the column marked "Time" when asked for the times, select "Censor" when asked for death/ censorship, click on the cancel button when asked about strata and when asked about predictors and select the column marked "Stage group".

For this example:

Cox (proportional hazards) regression

80 subjects with 54 events

Deviance (likelihood ratio) chi-square = 7.634383 df = 1 P = 0.0057

Stage group b1 = 0.96102 z = 2.492043 P = 0.0127

Cox regression - hazard ratios

 Parameter Hazard ratio 95% CI Stage group 2.614362 1.227756 to 5.566976 Parameter Coefficient Standard Error Stage group 0.96102 0.385636

Cox regression - model analysis

Log likelihood with no covariates = -207.554801

Log likelihood with all model covariates = -203.737609

Deviance (likelihood ratio) chi-square = 7.634383 df = 1 P = 0.0057

The significance test for the coefficient b1 tests the null hypothesis that it equals zero and thus that its exponent equals one. The confidence interval for exp(b1) is therefore the confidence interval for the relative death rate or hazard ratio; we may therefore infer with 95% confidence that the death rate from stage 4 cancers is approximately 3 times, and at least 1.2 times, the risk from stage 3 cancers.