How well do we know the inner structure of protons? The answer depends not only on the precision of experimental measurements and theoretical predictions but, as was pointed out recently, on the procedure of sampling of allowed solutions for parton distribution functions (PDFs) over the multidimensional parameter space. With large data samples, phenomenological PDF fits are at a risk of the big data paradox, which takes over the law of large numbers and implies that more experimental data do not automatically raise the accuracy of PDFs. Close attention to the data quality and sampling of possible PDF solutions is as essential. The big data paradox applies to uncertainty estimates both in multivariate analytical models and AI/ML methods. We discuss important implications for precision phenomenology at the LHC and Tevatron, in particular, the resulting differences between the NNLO PDF error estimates provided by various groups. In this light, significance of the experimental evidence for the nonperturbative (intrinsic) mechanism of charm production in proton scattering is reviewed.