@anonmod said in #76:
> > Therefore, if the fitting of the model worked at all, it doesn't matter that many of the decisions are not made with the model, it should still be able to predict those labels.
> Even if it was continuously re-trained in an online fashion, this would not happen. Neither Irwin nor Kaladin ever receive data that would enable them to detect browser plugins.
So are you saying that neither Irwin nor Kaladin have ever trained on labels which were generated by other methods than its own predictions? Because you are assuming that there absolutely no correlations between some of these other data that you say is used to automatically flag some players and the data that is fed to Kaladin and Irwin. That is likely a WRONG assumption, not only in my experience, but given my own research. if y~f(data fed to the system, data not fed to the system) where y is the likelihood that someone is cheating, as long as the COV(data fed to the system, data not fed to the system) != 0, the system should still be able to learn a patter that would lead to producing the labels. Of course, that depends on how much redundancy (e.g, the amount of covariance between the different features) there is in the data.
> Edit: By the way, have you figured out yet how that 99% overlap was achieved?
Yes. As I explain before, even if not all data used by lichess is available to train Kaladin and Irwin, as long as the missing data is correlated (linearly or non-linearly) with the data fed into the system, it should be possible to recover a function that generates the likelihood of the labels "cheat", "not cheat". The function does not have to be the same that lichess uses, since most likely the function space that maps the set of features I previously described to the likelihood of cheating has a compact-open topology.
> The fundamental issue for me is that while you are calling for a technical discussion, you keep making incorrect assumptions and presenting them as facts. A technical discussion based on wrong assumptions will lead nowhere.
I do not think I am making incorrect assumptions, but I do have the feeling that this discussion is pointless. I am not sure you are fully being able to pass across to me all you want to say, since from my perspective, I am having the feeling that you don't fully understand ML, although you say you have a background on it. Likely you do understand, but because we are not having an online discussion, it is quite hard to fully explain yourself. I believe, I might be having the same problem here, and what I want to say is also not coming across. Among all these, you have some times stupid message coming in the middle like the one of
@Jade-1 . Therefore, what I think we need is a forum specific for these things, where only people with a technical background and people with a strong chess background and interest in helping improve cheat detection are allowed.