***********************************************************************
* Code to implement correction for overfit in decision curve analysis.*
***********************************************************************

*Below is a step-by-step description of the correction for overfit, which is done using repeated 10-fold cross-validation:
* 1.	Randomly divide the reduced data set into 10 sets of equal size, ensuring equal numbers of events in each set
* 2.	Fit the model leaving out the 1st set
* 3.	Apply the fitted model in (2) to the 1st set to obtain the predicted probability of the event.
* 4.	Repeat steps (2) to (3) leaving out and then applying the fitted model to the ith group, i=2, 3,,10. Every subject now has a predicted probability of the event.
* 5.	Using the predicted probabilities, compute the net benefit at various threshold probabilities.
* 6.	Repeat steps (1) to (5) 200 times. The corrected net benefit for each threshold probability is the mean across the 200 replications.

*And here is the actual code

* First write a loop that contains steps 1-5, and will perform steps 1-5 a total of 200 times.

forvalues i=1(1)200 {

* Second, write the code for steps 1-4: obtaining predicted probabilities for each patient using cross-validation methods

*local event has the name of the variable 
*specifying the event of interest
*1 if event, 0 otherwise
local event="cancer"
*for this example we use two models
*code can easily be extended for more than two models
*local predictors1, local predictors2 are names of the 
*independent variables in the model
local predictors1 = "total_psa"
local predictors2 = "total_psa free_psa"
* give names to prediction models
local prediction1 = "base"
local prediction2 = "full"

*initialize variable to store probabilities from prediction models
g `prediction1'=.
g `prediction2'=.

* assign patients to one of 10 sets
quietly g u = uniform()
sort `event' u
g set = mod(_n, 10) + 1

* fit the model excluding the ith set, apply to the ith set
		forvalues j=1(1)10{
			* build model 
			quietly logit `event' `predictors1' if set~=`j'
			* apply to the i-th group
			quietly predict ptemp if set==`j'
			quietly replace `prediction1' = ptemp if set==`j'
			* drop prediction to be generated for the next set
			drop ptemp


			* and the same for the full model
			quietly logit `event' `predictors2' if set~=`j'
			quietly predict ptemp if set==`j'
			quietly replace `prediction2' = ptemp if set==`j'
			drop ptemp
			}

* Step three is to invoke the dca command for step 5: work out decision curve for data set created by crossvalidation

		* invoke the dca command, 
* saving the results to a tempfile
tempfile dca`i'
quietly dca `event' `prediction1' `prediction2', graphoff saving("`dca`i''")

	drop u set `prediction1' `prediction2'
}

*Fourthly, using the tempfiles saved out, take the mean over the 200 iterations to obtain the corrected net benefits.

	use "`dca1'", clear
	forvalues i=2(1)200 {
		append using "`dca`i''"
		}

	collapse all none modelp1 modelp2, by(threshold)

* save out the results
save "cross validation dca output.dta", replace

* figure of net benefits
twoway(line none all modelp1 modelp2 threshold, sort)


