How do you determine accuracy of AI data mining models?
The majority of data miners, including geological data miners, cut their teeth on the test confusion matrix, which is a type of statistical QA/QC on models that in the end should get reflected in real world results. The test confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand as a QA/QC measure.
What do classification results look like?
Our geological data mining is so accurate that typically we see mine deposits classified where existing and historic open pits are overlaid by the metal classifications (..or underground operations are overlaid for metals as classification "coloration," when we do have access to mine plans). Every day, we typically see dozens to hundreds of examples of this direct connection of our geological data mining correctly identifying deposits, and surrounding potential deposits around existing mines, which provides a direct visual indication to the success of our models, beyond the test confusion matrix, which while "correct" as a QA/QC statistical measure, doesn't bring home for people how accurate our algorithm based methods are compared to actually seeing it visually as an overlaid classification on a map, or as a market indices that reveal which group of companies are placing themselves on the right mineral claims to find resources.
How do you get good at data mining?
When people count a box of matches, do you tend to count them in an instant, revealing surprising party trick abilities, or idiot savant behaviour? A person doesn't have to be an idiot savant to be a good data miner. Actual day-to-day data mining on geological data is a skill in identifying clean, non-manipuated datasets, working on the right kind of algorithms that work best on geologically faulted datasets, figuring out sometimes obscure geochemistry terminology and its bearing on models, data cleaning skills, associating mine economics with power lines, road infrastructure, mine power supply to derive an overall picture of a good, or poor investment. One needs a solid foundation in data mining, supercomputer level work with respect to our organization, GIS skills, and persistence to get accuracy levels at least between 95 to 99% accuracy. And doing hundreds and hundreds of models to get it right.
Do data miners get over worked, or do you clock out every day at reasonable hours?
That is a good question! Covid-19 proved beneficial for us data miners, as we couldn't go out, we were chained to our computers for hours on end, rowing like galley slaves in a Roman Fleet, with our leader, James Cormier-Chisholm, cracking a whip over us, while a very rotund man, Mr. Hortator, beats on a floor drum to keep the pace while we row. It's a fun place to work!
In reality, data mining is a 9 to 5 job, we work reasonable hours, and we believe in family values. We don't try to use fantastical terms with respect to our work which more or less means you have no life, like we are changing the world, and that we are industry disruptive (pssk...we are changing the world, and we are mining industry distruptive as a technology. Our work is simply better than others in this new area called geological data mining).