This is great because everyone so far has pointed out inherent flaws in the technique and in essence agreed with my argument; that it is flawed. I hope everyone has read, Evidence Based TA, by Aroonson, because that is the context we are discussing.
Now, regarding some of your arguments on the simple T-test and potential flaws, there is no problem with creating a system in real life with padded don't cares, since in reality that is what we do. We are not always in or out of the day's action. In fact, it is exactly the way that the book approaches monte carlo permutation, to assign a -1, 0, or +1 randomly, to the underlying data and use this in a basic hypothesis type test of the system (A here) vs. a chance based system, to determine if the system is significant at 5% or less. According to the author, all of the systems he tested failed in this criteria. I have shown one example whereby it does not apply.
How many others are there?
You can remove the padded zeros pairs as well, and in this case it will still not be significant, since the means are both around zero. I did kind of cheat a bit on variance (+11% day), but that was to prove another point about what the market actually throws us.
We could take it a step further and generate lots of data via bootstrapping or monte carlo, then generate a sampling distribution to test the means, but I'm doubting it will affect the results much. It is clear from my premise that hypothetical system A must be significant given hypothetical index B, one simple example in a universe of many.
Hopefully, we can stay on topic, and rather than point out the pit falls inherent in the example, point out specifically how the book we are discussing addresses or is not pertinent for testing these issues. Maybe this needs a new spin-off thread on its own as I know this was a must read book thread.
BTW, system A is not punished. It is actually rewarded by skipping random signs. But I get your point.