This site will work and look better in a browser that supports web standards, but it is accessible to any browser or Internet device.
The innovation underpinning Datafactory is that the subjective priors of credit-risk experts can be represented in a dataset that has been constructed from their input. Moreover, this dataset can be constructed in a way that perfectly matches the hard data that will build up as the actual data on loan performance accumulates.
This particular representation of people's expertise ensures that the same mathematical methods that can be used to fit a mathematical model to hard data can be used to fit a mathematical model to the subjective priors of the users involved in the dataset development.
An obvious question arises. The risk assessment input from users of Datafactory comes in the form of default probability assignments rather than an observation of a default or non-default. Hard data only reveals the actual outcome: default or non-default. How can this difference in the available information be eliminated?
A naive approach would generate a simulated dataset from the original information provided by Datafactory. The modeller could replicate a each individual loan in the original dataset a large number of times and assigning each loan a default outcome for each replication with a probability equal to the subjectively determined default risk probability. In the limit, as the number of replications grows, this is equivalent to simply using two weighted observations for each loan in the dataset produced by Datafactory, one with a default outcome and one with a non-default outcome. The weight on the default outcome is set equal to the subjectively assigned default probability for the loan. The weight on the non-default outcome is one minus the subjectively assigned default probability for the loan.
Applying this simple transformation to the original dataset ensures that the data produced by Datafactory is exactly equivalent in structure to the hard data that would be used if it was available.
what should banks do when some historical default/non-default data is available? This situation often arises when a model is first being developed and data exists but is not sufficient for model development purposes. It will also definitely arise when a model has been developed and has been in use for some years, generating hard data over that time.
Such situations, where hard data and subjective data need to be combined, can be challenging to deal with. In traditional circumstances, subjective input comes in the form of funcational form choices for the model and parameter constraints. Where the empirical evidence is at odds with the subjective constraints on parameters or with the choice of functional form for the model the usual statistical techniques will tend to minimise the influence of the affected risk factors.
By representing subjective input as an augmentation to the available historical data, these data-combination difficulties are easily dealt with. The one statistical method can be applied to the combined dataset in the usual way. The outcome, in terms of parameter choices will depend on the relative weight assigned to the subjective and the objective data. The relative weight on subjective and objective data can be varied to ensure that the model is not producing counter-intuitive results. It can also be varied as the available objective information accumulates.
If you wish to discuss model development strategies using data from Datafactory, please contact Geoff Shuetrim.