In the most recent edition of PhysicsWorld, there are two articles that on the face of it have little to do with each other: one is about Jan Hendrik Schön, the physicist formerly famous for creating the first organic superconductor and the first single-molecule transistor, and now most famous for having simply made up all of those results out of thin air, the greatest kind of scientific fraud in physics. The other article, by Michael Nielsen, is about how the internet is transforming scientific communications, looking at which new means of scientific communication failed (such as Physics Comments and scientists contributing to Wikipedia -- although Scholarpedia is taking off quickly at the moment, probably because its signed and peer-reviewed authorship model is more in line with academic customs than Wikipedia's semi-anarchistic one) and which succeeded (the arXiv, of course) in making the dissemination of scientific results quicker and more transparent.
At first glance these two topics appear to have little to do with each other. At second glance, however, they are closely intertwined.
Schön's deception was only possible because the researchers who tried and failed to replicate his results didn't have access to his primary data. Once doubts had been raised over the appearance of two completely identical graphs supposedly representing two completely different sets of experimental data, Schön's primary data were subjected to close scrutiny and were found to be non-existent -- his labbooks had been destroyed, and his samples were damaged beyond recovery. This raises the question whether it would have been possible to even contemplate such a fraud in an environment where scientists are genuinely expected to hide nothing, and in particular to make their primary data publicly available after publication.
The more radically open schemes, where raw data are being made public before publication, are unlikely to take off largely because of concerns over the enormous plagiarism potential. But once results have been published and priority has thus been established by the original authors, there is no immediately obvious reason not to allow other researchers to perform their own analyses of the primary data, either to confirm (or possibly to refute) the original analysis, or to use their own methods to obtain results from the data that the original authors didn't (either because they weren't interested or because they didn't have the relevant analysis methods at their disposal). Some access controls are needed, of course, in order to ensure that the later researchers will duly acknowledge the use of the original group's datasets.
It is hard to see how a fraud like the Schön case could have occurred under a scheme like this; the groups who wasted years on trying to replicate his results to no avail would likely have realised the fraud if they had had access to Schön's lab books.
Just like with the arXiv (which after all started out as a specialised High Energy Physics preprint server and now has revolutionised publishing in most of physics and mathematics, plus assorted other areas), particle physicists are pushing ahead with schemes to open access to raw data, and lattice QCD is right at the forefront of the movement: since the most expensive step in unquenched simulations is the actual generation of the gauge configurations, using those just once for whatever analysis or analyses interests one specific group would be an irresponsible waste of computer resources, postdocs' lifetime and taxpayers' money.
It has therefore been common for a long time now for lattice theorists to form larger collaborations that pool their resources to generate their configurations and then perform different analyses on them (policies differ: some collaborations publish all of their papers as a collaboration, some break up into smaller groups for most analyses). But with the huge effort needed for unquenched simulations on large ultrafine lattices with very light quarks, even that becomes inefficient; in particular, groups that don't belong to any of the major collaborations would be left out in the quenched darkness. Therefore, it is becoming an increasingly common policy to make gauge configurations available to the larger lattice community after performing some initial analyses that the collaboration generating the ensemble is particularly keen on doing (generally, that includes the hadron spectrum, plus some other stuff).
Configurations have been available for a while at NERSC's Gauge Connection, and are now quickly beginning to be available on the International Lattice Data Grid (ILDG). This way the many CPU cycles that have been invested in generating these ensembles are put to even better use by enabling other groups to run their analyses on them.
Just like in the case of the arXiv, it may take a while for other disciplines to follow suit, but it appears likely that if and when more and more scientists choose to make their raw data public after publication (and those that don't therefore become increasingly subject to suspicion by their peers), a fraud case like that of Jan Hendrik Schön will become quite impossible at some point in the future.