Predicting first criminal justice contact before age 18 using a large linked administrative dataset

Summary

Background

Early contact with the criminal justice system is widely recognised as a key risk factor for future offending and can have profound, long term consequences for young people. Policymakers have increasingly sought to identify children most likely to benefit from early intervention programs, but previous research in this area has relied on a limited set of variables and used small sample sizes drawn from non-representative populations.

This study uses the NSW Human Services dataset (HSDS) to identify predictors of a person’s first contact with the criminal justice system before age 18 and to evaluate the performance of predictive models using these predictors. The HSDS contains linked data from seven NSW government agencies including justice, police, health, education, child protection, housing and revenue.

We built machine learning models predicting first criminal justice contact at different ages, from birth to 17, using linked data for 259,160 children born in NSW between 1998 and 2000. We report how effective our models are at prediction, whether the models perform differently for subgroups of the data and which predictors are used in each model. We also examine the role of individual datasets in model performance.

 

Key findings

Our models predict that between 1 and 2% of the birth cohort will have contact with the criminal justice system before the age of 18 and this prediction is correct 32-55% of the time depending on the age at which the prediction is made. The accuracy of the models across the whole sample is approximately 95% and the models achieve an AUC of 0.74 to 0.83 for all age cutoffs, which suggests an “acceptable” level of performance.

However, our models have much lower accuracy for Aboriginal young people, with the accuracy dropping by as much as 22% (see Figure 1). The lower overall accuracy is driven by a higher number of false positives among Aboriginal individuals. We see similar patterns between males and females, but the magnitudes are comparatively small. These results are a form of model bias and demonstrate that aggregate performance of the model may be misleading as a justification for its use.

Figure 1. Accuracy of prediction models for first criminal justice contact before age 18 by age cutoff, showing the proportion of correct predictions for the total sample, Aboriginal children, and non-Aboriginal children


 We also find that the most important predictors change markedly for each age cutoff. Although demographic characteristics remain predictive from birth to 10 years old, from age 15, education data on suspensions becomes more important. Finally, we show that no single data source is critical for these predictions.

Conclusion

Linked NSW Government data can be used to predict first criminal justice contact before age 18 with reasonable accuracy. However, models perform poorly for Aboriginal children. Any practical use of these models must consider ethical risks such as misclassification, stigma and unintended consequences.




Last updated: