Would a proof of P=NP have philosophical implications? If so, what?

Does it make sense to discuss “HumanP,” loosely defined as the class of decision problems solvable in polynomial time by human beings? Is “HumanP” meaningless because we can’t give human beings arbitrarily-long inputs, in contrast to idealized computers? Or does this problem simply underscore that the asymptotic limit isn’t what we “really” cared about in the first place, but is just a theoretically-convenient approximation?

What can be said about the relationship of “HumanP” to P and to NP (in either direction)? Do we need to distinguish different versions of “HumanP”, depending on what resources the human has available (e.g., paper and pencil, a classical computer, a quantum computer), or whether we’re considering subconscious processes or conscious processes only?

How would a chess program need to be designed so that it chose the right move in the “wall of pawns” position? How would a SAT-solver need to be designed so that it quickly determined the unsatisfiability of a propositional formula representing PHP_{n} (the Pigeonhole Principle with n pigeons)?

Is the problem with existing SAT-solvers merely that they’re missing relevant “abstract knowledge” (e.g., that the Pigeonhole Principle is true)? Or is the problem that, even if they had that knowledge, they wouldn’t know how to use it, or to recognize when it *should* be used? Or is the problem that they don’t have the capacity to generate such abstract knowledge themselves? Are these distinctions even meaningful?

Does it make sense to speak about the “largest known prime number”? By what criterion, if any, can we say that **p := 2**^{43,112,609} – 1 is a “known” prime, whereas **q := the first prime larger than p** is not?

Are the above questions all “pseudoproblems” that evaporate as soon as we fix the right formal definitions for notions such as *knowledge*? If so, then what *are* the right formal definitions? And what questions should we be asking instead?

### Like this:

Like Loading...

*Related*

Just to test the wall of pawns example, I created a .pgn file of that chess position and played it with Shredder, a cheap but decent chess engine that works on Linux. Even in blitz mode (much less than 1 second of compute time per move), Shredder never captures the rook as white. I can send out the .pgn if others want to test on different chess engines. I do not think that this particular chess position would stump a computer even as the board size increases. It’s easy for a computer to see that breaking the pawn wall is immediately a bad move in just the same way that it’s easy for a human to see it. If you made a more complicated position, then a human’s ability would suffer as the board size increased because the human would not be able to rely on a fast visual heuristic to make a move believed to be correct. It would be neat to use complexity arguments to see just how large you’d need to make a minimally complex position to fool the computer. I believe the limitation for the human in that case would actually be visually examining the extent of the board to even recognize what the position was.

There is a flip side to all of this too. Because human chess heuristics are encapsulations of common experiences, it is easy to engineer example where any given heuristic fails immediately. Much of modern chess theory is focused on how to avoid heuristics all together. The move procedures for many top players are reminiscent of trying to consciously execute a simulated annealing algorithm. The primary use for heuristics in chess is to help a player manage time resources to avoid decision fatigue. The use of heuristics like the pawn wall is completely a byproduct of resource constraints in human computing; it’s a mischaracterization to claim that it is more elegant or more advanced to think of chess in terms of pins, x-rays, forks, pawn chains, or sacrifices than it is to think of it in terms of enumerated move lists. The former terms are just compressed English descriptions of patterns in searching local move lists, all of which are included in modern chess AI.

In class, someone mentioned that we could encode certain structure-laden problems as chess positions. What exactly was this claim? I don’t see any class of problems for which humans can quickly solve them with guaranteed accuracy, computers cannot, and they can be encoded as chess positions. If the claim is just that humans have heuristics that work very fast in a variety of situations where computer methods are conceptually slower, I don’t see what conclusion this leads us to. Isn’t this like saying that humans have a small look-up table that has been shuffled around, re-written, and tested over hundreds of years, so that making queries to the look-up table are accurate with high probability, relevant, and cheap? Isn’t this just evolution hacking together different heuristics? I don’t see how that indicates abstract knowledge, unless ‘abstract knowledge’ is just taken to mean hacked together heuristics that are selected by evolution, which is a definition I would probably agree is good and realistic in the case of humans.

Regarding the PHP, the inductive correspondence between mathematical objects and real objects has nothing to do with whether mathematics is “real” in a platonic sense. It’s just that we have different ways of believing. I don’t believe that all intelligent people must come to the same point of view just because of the Aumann Agreement Theorem and Robin Hanson’s paper on pre-priors and origin disputes. Those are nice theorems, but they don’t correspond to the way human beings actually reason, so I have no problem believing that they fail. That’s heuristic, inductive inference based on interacting with people and relating that to mathematical principles. If I ever saw two pigeons fit in a hole where only one pigeon can fit, I’d be happy to believe that the mathematics we use to describe the PHP just doesn’t correspond to reality. The physical concept of quantity and capacity in that idealized pigeon/hole situation would break down and I would seek a new mathematical theory for arithmetic of quantities. I’ve never experienced the relationship between numbers and objects to behave that way, so I readily use the PHP as quick, heuristic reasoning. If you want to divorce the PHP from real life pigeons and holes and just talk about the purely logical abstraction, then all you’ve really said is that “because I know n+1 > n in this axiomatic system, and a rule for mapping problem to integers was hardcoded into my system, I can quickly conclude the PHP; a computer that specifically does not have the hardcoded correspondence between problems and integers cannot do this.” If we try to think of it from the SAT solver’s point of view, why can’t we view the pigeonhole principle as a proof that n+1 > n as much as anything else in this situation. The SAT solver doesn’t also have experiential data about quantities and objects to use ‘abstract common sense’ to generalize over all cases.

I just want to make sure we pose this problem in a way that does not concede too much ‘abstraction’ to human thinking, or else points us to sources that suggest we really do have that abstract thinking rather than just a bag of experiential heuristics that relate real objects to mathematics. Most mathematicians I know tend to prove theorems/locate counter-examples only after they go experience some chunk of reality that convinces them that the theorem “should” be true/false. That seems like an indication that human mathematics is not such a genuinely abstract activity.

First, on the factual questions: another student emailed me to say that he investigated, and found out that Fritz, the currently-best chess program, would NOT make the correct move in the wall-of-pawns position. I’d be very interested to understand what it is about Shredder’s evaluation function that causes it to make the correct move.

Second, I was the one who pointed out in class that chess is EXP-complete, as shown by Fraenkel and Lichtenstein in 1981. This result implies that, if you believe there are ANY puzzle problems where humans do much better than any computer programs we know how to write today by recognizing global patterns, then you can efficiently map those puzzle problems onto chess positions, in such a way that any human who understood the mapping could figure out the right move to make, but (by assumption) the sorts of computer programs we know how to write today wouldn’t make the right move even if given a description of the mapping.

On the other hand, if you don’t accept that any such puzzle problems exist, then obviously you can reject the above argument. And this brings us to what I think is the central point. Before Wednesday’s class, I took it as obvious, well-known facts—accepted not only by AI critics like Penrose, but also by any AI proponent you care to name (Minsky, Dennett, McCarthy…)—that

(1) humans have some sort of capacity for abstraction and generalization endowed by millions of years of evolution,

(2) this capacity far exceeds any such capacity in any existing (repeat:

existing! 🙂 ) computer program, and(3) figuring out how to give computer programs a similar capacity for abstraction is one of the great challenges for AI.

After illustrating that capacity through a few examples, I wanted to spark a discussion about what exactly the capacity consists of, and whether and how we can understand it in terms of computational complexity.

However, unless I’m misunderstanding, you and several other students in class did something extremely interesting that I wasn’t expecting: namely, you deny that the human capacity for abstraction and generalization exists at all!

For whatever it’s worth, I disagree sharply with this denial. Even if you dislike the chess or pigeonhole principle examples for whatever reasons, it seems to me that examples where humans solve instances of NP- or PSPACE-complete problems

faster than any computer program we know how to write todaycan be multiplied endlessly—i.e., that this is a game you’re going to lose. What about inverting CAPTCHA functions? Or folding proteins (a task for which biochemists have been recruiting people over the Internet in recent years)? Or deciding whether a chessboard can be covered with dominoes (each of which covers two adjacent squares), after the top-left and bottom-right squares of the chessboard have been removed? (If you haven’t seen this one before, try it!)Incidentally, the view that you (bobthebayesian) are now defending seems to me to be in tension with the view you expressed in the comments on the first post: namely, that it’s crucial to aim not merely at building an AI that passes the Turing test, but at

understanding how the AI works. If there’s no interesting human capacity for generalization or abstraction—if the only difference between the AIs we have today, and AIs that could pass the Turing Test, is the number of hardcoded hacks and heuristics—then how much could there possibly be to understand about the latter sort of AI?These are extremely interesting points and I want to explore the EXP completeness idea further. FWIW, none of the example problems you mention seem compelling to me. They would count as evidence of the sort of abstraction you mean only if humans can solve them with both 100% accuracy and efficient extensibility. What I mean is, even if a human can solve some particular instances of an EXP problem very fast, that doesn’t seem to imply they have any abstraction capability to solve all instances in that class, or that their ability scales efficiently as the input size grows. They might just have a hacked together look-up table (collection of heuristics) that works in some limited cases. How does human captcha solving ability change if the language of the captcha is switched from their native language to some random set of symbols that I show you ten seconds before giving you the captcha? The time it takes you to build the heuristic should surely count against you. Are captchas known to be an EXP problem? I work in computer vision and it seems like one could not possibly be able to classify these problems without first being able to invert the visual process to make claims about what steps are required to solve a captcha. In other cases, though, where you can train linear SVMs to break captchas with near human success, this might suggest that that class of captchas is actually in P (training the svms is a polynomial time algorithm). One of my main research goals is to actually explore these ideas and see if precise complexity statements can be made about human perception problems.

If we know the evaluation function of a chess AI, then we can engineer a case where we can fool that evaluation function. Why isn’t the same standard applied to humans? As I mentioned, there are easy examples (in fact most chess training centers on such examples) where human heuristics fail. To be good at chess you really need metaheuristics that tell you when to ignore the heuristics. That really strikes me as just hacked together heuristics, rather than any generalized process for abstracting a situation.

Also, with regards to my first post, I think the task of building a good AI is precisely the task of understanding the hacked together heuristics. There may be a more elegant description of AI, but I don’t see any evidence for that at all. I think given what we know about intelligence, our best guess is that intelligence is just ever more complicated tapestries woven together with heuristics. I side with Robin Hanson’s position about this heuristic nature of (human) intelligence in this debate: (http://www.youtube.com/watch?v=m_R5Z4_khNw&noredirect=1)

How about this candidate for what (current) AIs are bad at: “AIs are bad at analogies.” Or, maybe even “AIs are bad at finding reductions.” Although this wouldn’t settle the heuristics vs. universal-capacity question, it will narrow it down to the question of whether there’s something about the domains humans managed to find their way around– the physics that humans managed, the math that humans managed, etc. — to explain why heuristics devloped for hunter-gatherer problems happen to work great as analogies to structures encountered in these domains. If the structures in these domains aren’t correlated in some non-trivial way with the structure of hunter-gatherer problems (like, if we can’t show the we select our challenges in a biased way, or that our evolution was somehow able to select for heuristics that would work well in the math and physics that will come up in human lives), we might have to conclude that humans evolved a “deep” capacity for turning unfamiliar problems into familiar problems.

The existence of logical fallacies and optical illusions points to flaws in our mental heuristics. But the fact that we can think around them, noticing and correcting for the mistake, suggests that we also have higher-order mechanisms, or metaheuristics, that let us control our thought processes in ways that much of today’s AI cannot. The question, then, is whether these are 1) actually unique to biological humans, 2) a necessary component for any computer to act “intelligently”, 3) an emergent property of some level of complexity or some overall structure, 4) still limited but we haven’t hit the limits yet, or 5) flawed in ways that are impossible for humans to discover by their very nature, but that have been present for all of human history in some form or another.

BobTheBayesian: I’m sympathetic to almost everything you said, but I do think the distinction between mathematical beliefs and beliefs about correspondences between math and the real world can get tricky. Is my belief that if there are n 1 real-world pigeons and n real world pigeonholes then I won’t count n pigeon-pigeonhole pairs without counting some pigeonhole twice just a defeasible belief about the correspondence between math and the real world? (I don’t have a satisfying way to categorize this belief myself. It “smuggles in” implicit math into the description of the real-world predicament, but its unclear whether there is any way to seperate the belief into a real world element and a math element.)

Make that n 1 pigeon-pigeonhole pairs.

This is a good point, Peli, which I believe goes back to Wittgenstein’s idea of logical primitive names. Or at least you have inspired me to think a little bit about whether there is a similar ontological connection between real world objects and the standard constructions of set theory that humanity has converged on.

(Not in the class, apologies if I’m not supposed to post.)

When I read your paper, I was skeptical about your point regarding humans, SAT, and PHP. Can most humans really (and easily) identify violations of PHP expressed as 3SAT problems? I don’t think they would, even if they knew PHP. I thought about this problem and figured that a PHP-violating 3SAT expression would need a lot of clauses. I found this example of a 3SAT PHP expression with four pigeons and three holes:

(a+b+c)(d+e+f)(g+h+i)(j+k+l)

(~a+~d)(~a+~g)(~a+~j)(~d+~g)(~d+~j)(~g+~j)

(~b+~e)(~b+~h)(~b+~k)(~e+~h)(~e+~k)(~h+~k)

(~c+~f)(~c+~i)(~c+~l)(~f+~i)(~f+~l)(~i+~l)

Sorry, but I don’t think humans would instantly “see the light” on such a problem unless you specifically reminded them of PHP and suggested they read the problem as an instance of it. But then, that would no longer prove it to be in HumanP, but rather, in “HumanP augmented by a good-heuristic oracle”. (HumanP^GHO) And of course having access to such an oracle is going to give any Turing machine a lot more power!

(In other news, in the real world, it does matter which order you stack objects in when determining the total height: stiffness and weight matter. Compare stacking an anvil on a spring vs. a spring on an anvil.)

Hi Silas,

The crucial point you’re missing is that the effort needed to recognize a collection of SAT instances as encoding PHP

_{n}is only aconstantamount of effort, independent of n. Once you’ve paid that up-front intellectual cost, you can then quickly decide that the instance corresponding to PHP_{1,000,000,000}or whatever is unsatisfiable, after investing only a linear amount of effort to make sure that the instance has the correct form. (And here, by “you,” I mean either a humanora computer program with a suitable capacity for generalization, although I don’t think the latter exists as yet.)Also, I was talking about books, not about springs. 🙂

–Scott

Okay, but if it only takes an n-independent amount of time to recognize PHP_n within any SAT problem, then what exactly is the deficiency you’re attributing to computers? That is, why can’t you just augment any SAT-solver with the PHP-detector, eliminating the supposed advantage humans have in recognizing global symmetries (in this respect)?

Furthermore, I don’t see how the effort in detecting it could be constant. To recognize a PHP violation, you have to first identify a similarity (that justifies the check), and then grind through and verify the isomorphism (for some n), effectively doing the work of a graph isomorphism problem. Yes, once you’ve found this isomorphism you can instantly conclude it’s unsatisfiable, but the difference between a literal PHP question and a SAT expression of it seems like the difference between “Is 91 prime?” and “Is 7*13 prime?”

Silas, it’s obvious that you can write a program to recognize a

specificclass of tautologies like PHP_{n}, once you’re told what they are. At the risk of repeating myself hoarse, the challenge is to write an AI that can find regularities like that onewithout your having to hard-code descriptions of those regularities in advance.Also, I don’t claim that either a human

ora computer program can recognize all tautologies that are unsatisfiable “ultimately because of the pigeonhole principle,” no matter how obfuscated those tautologies might be. The claim was only about the PHP_{n}tautologies, which one can recognize as having a very specific form.I’m frankly blown away by how many people are disputing what I thought were obvious points. Maybe it would be best to add an explicit proviso:

if you think I’m saying something controversial, then you’re misunderstanding me!In particular, there’s no claim whatsoever here about an inherent limitation on any AI programs, only about the sorts of AI programs that actually exist today.I think that my claim, which I am by no means totally convinced of, is that humans also have to have this advantage hard-coded into them. I’m not sure to what degree I buy the idea that we ‘abstractly recognize’ such things. This of course begs the questions “where did the human ability to generalize PHP_n originate?” and I don’t have a well-formed answer for that, but I don’t also believe that it’s obviously due to human abstraction capabilities, but could be convinced that it is due to abstraction capabilities with more specific evidence of the manner in which we develop such ideas and use them in practice.

Scott, if you’re going to rule out AIs that have PHP-violation detection hard-coded, you have to rule out

humansthat have PHP detection “hard-coded” too. In our case, that would mean someone giving us the hint to check for PHP violation.That’s why I raised the point about a “good heuristic oracle”: if a human notices that a 400-variable 3SAT expression is unsatisfiable simply because you whispered, “psst! How would PHP look as a 400-variable expression? See any similarities?”, then you’re not talking about HumanP anymore, but HumanP^GHO (which is quite similar to a nondeterministic Turing machine’s capabilities). And in that case, it’s only fair to compare to AIs that can get similar advice.

It’s also why I disputed (above) that your average human is going to have that *click* that makes them see the mapping from the problem they’re currently looking at, to the PHP — even if they know the PHP. (Perhaps they would if you phrased it as “can 5 objects fit into 4 boxes …”, but not in its scrambled form as a SAT expression, hence my comparison to primality checking in different forms.) This relates back to the logical omniscience problem: from the fact that someone knows, or can quickly discover, some general rule, it does not follow that they can quickly recognize that it would be useful on any

particularproblem.(Is there experimental evidence of average humans being able to solve large SAT problems that violate PHP

withoutthe mapping being specifically suggested to them? That would very much surprise me.)Like what bobthebayesian said, if you’re going to claim some kind of

generalcapability on the part of humans, you would have to claim that they always invoke every useful heuristic — not just PHP — when looking at problems. Otherwise, it is indeed fair to just hand AIs the short list of deep insights humans have privileged, non-hard-coded access to.No, I wasn’t talking about someone being given a hint that a collection of SAT instances encoded PHP. I was talking about a reasonably-intelligent person figuring out the pattern, possibly after staring at the instances for many hours or even days. Yes, that’s right—it could take days. But that’s still nothing compared to the quadrillions of years it would take if you only knew how to resolve pairs of clauses, and lacked the ability ever to discover anything better!

If by “reasonably-intelligent person”, you mean the kind that’s likely to participate in a Putnam contest … maybe. (Normally, in any thread,

I’mthe one overestimating average human intelligence!) You’re still claiming that, with nothing else to bring it to their attention, without intending to look for PHP violations (which would be a “hard-coding”), with no special problem formatting (e.g. the pieces of the PHP pattern are scattered across several pages) they will (relatively quickly) notice that an arbitrary SAT problem containing a PHP violation is unsatisfiable on that basis.Andthat the same holds for every other simple heuristic phrased in a foreign context, without having been specifically taught those heuristics. Color me skeptical.To clarify, I don’t dispute that there are problems human currently solve exponentially faster than computers, which is indeed uncontroversial. I don’t even claim humans have difficulty on problems with trivial isomorphisms to PHP — but SAT is not one such instance.

Also, forgive a stupid question: if you had a SAT-solver check for whether permutations of the variables preserve the form of the expression, wouldn’t that save it from having to check a number of possibilities as large as the number of symmetries in PHP_n? (It takes relatively few checks to detect such an extensive symmetry, right?)

It would seem this would achieve about as good an average speedup as having PHP hard-coded, but then, achieving this is still an open problem, so perhaps I’m missing something.

Silas: You’re right, it wouldn’t take much time to detect such a large symmetry, once you had some idea what you were looking for.

Look, I freely admit that much of humanity, much of the time, would do no better at this task than a simple SAT-solver, or even a lot worse.

But imagine that you were imprisoned by a madman, who wouldn’t free you until you had proved the unsatisfiability of PHP

_{n}for all 1≤n≤1000. After (say) a month or a year of drudgery, do you think that light would eventually dawn?(And yes, I know I said hours or days in the last comment. The great thing about making an asymptotic claim is that, if whatever specific amount of time I offer to illustrate my point seems implausible, I can just keep upping the time bound until I win! 😀 )

A colleague linked me to this article by Peter Norvig: (http://norvig.com/chomsky.html) which was an interesting read. I am curious what others think of the section called “The two cultures” and how it relates to our talk about heuristics.

I read that piece a few months ago, and considered it the most brilliant takedown of Chomskyan linguistic theology I’d ever seen. Even though it’s only slightly related to 6.893, I’d recommend Norvig’s piece in the strongest terms to anyone interested in the logical vs. statistical AI debate.

To reach human intelligence from formal systems as we currently understand them, we need to modify our notion of a formal system. We can’t demand soundness: our cognitive biases can cause us to make consistently wrong decisions on large classes of problems. We can’t demand completeness: we will happily answer “I don’t know” to ambiguous or complex situations. But, and perhaps more importantly, we need to modify our axioms and inference rules. This can be seen in the case of the pidgeonhole principle, the proof of which requires exponential space if you are in a logical system without the cut-elimination rule. I would like to suggest that this encapsulates what it means for humans to have the capacity to abstract and generalize: scientific induction leaves us with unjustified, but useful, universally quantified rules of inference.

The ability for computer programs to modify axioms and inference rules was not given serious credence by Hilbert, who thought that while computers could automated the task of proving theorems, it would still be up to mathematicians to devise suitable axiomatic systems for these computers to operate with. Why did Hilbert think that? For one thing, a formal system rarely lives in a vacuum: they are often motivated by some aspect of the real world. For example, Euclidean geometry was based of an (erroneous) impression of the way physical space worked; arithmetic seems to stem from some innate ability of humans to count. Our investigations of infinity stemmed from its convenience as a formal construct and its appeal as a concept to our intuitions, although propositions like the axiom of choice have required us to adjust our intuitions in the face of axiomatic systems of our own devising.

It would seem, then, that the primary difficulty of devising artificial intelligence, as we define intelligence, is that our current systems lack the ability to use physical experience to motivate choices of axioms and inference rules. And as such, these physical experiences define what problem spaces that we care about. It is true that humans can solve instances of NP/PSPACE problems faster than any computer can, but the reverse is true as well. The fact that the former is greater than the latter is a testament to the power of millions of years of evolution, as well as the many years of “learning” that all human beings undertake.

While it would no doubt be very inefficient, I could imagine a computer that would be able to “spawn” subprocesses in a given simulation with different axioms than itself. And since a program is able to modify its own code, could change its own axioms. The initial set of axioms would probably affect what simulations it would run and thus affect what other sets of axioms it could use. However, I don’t believe that this is inherently different than what humans do, the subprocesses are what we would call “thought experiments”, but even in these thought experiments, I highly doubt that we can imagine anything that is completely removed from our own world, but are instead limited to having some set of constraints that come from our previous beliefs/axioms.

A lot of the NP problems that we seem to be able to find solutions or close approximations to the solutions seem to be ones where we have some physical representation of the problem. PHP (or even Sudoku) encoded into SAT is a good example of this, but other ones that I can think of are TSP and Hamiltonian Path, if we restrict it to Euclidian graphs. However, it seems that as soon as we remove this restriction and allow for general graphs, much of our advantage over computers is removed.

I am curious:

What research programs (if any) have explicitly investigated the “abstract reasoning” vs. “heuristic reasoning” question? (See post above by bobthebayesian, and its replies.)

Any interesting data and/or conclusions?

I am myself not even sure what informal meaning one could give to “abstract thinking”; for one thing, whatever it is, it must be allowed to err in its conclusions, for humans certainly do at times err in, say, conjectured generalizations. (On the other hand, it seems that the notion of heuristic has at least an intuitive meaning, and even several formal instantiations.)

I do agree that it does seem plausible to posit that the “generalization powers” of “abstract thinking” that we tend to experience could be provided by a sufficiently rich use of meta-heuristics, an appreciation for self-consistency, and being able to spot contradictions.

For example, the process of learning could be phrased as internalizing more and more sophisticated heuristics, and the process of learning to learn could be phrased as internalizing meta-heuristics (such as the ability to recognize when a heuristic is better than another one, combining multiple heuristics into a better one, etc.). Vaguely, one could say that “learning to learn” could account for much of our intelligence.

The plausibility is increased further if we consider that individuals are embedded in a society that allows for discussion with other individuals, where the repertoire of heuristics can quickly increase in complexity and sophistication by discussion and exchange of ideas.

Lastly, all of this discussion seems to be inextricably tied with language and physical intuition, both of which many would agree are important elements to the process of thinking, and would need to be taken into account if one wants to argue an inherent gap between “abstract thinking” and what can be achieved by mere “heuristic thinking”…

I think the concrete underlying distinciton is between domain-specific mechanisms (=heuristics) and domain-general mechanisms (=abstract thinking). Any sufficiently subject-matter-independent heuristic just is abstract thinking.

I would define abstract thinking as our set of heuristics for pattern recognition.

One thing to note is that there’s no reason to expect this to be infallible or even logical. We do have the ability to start with axioms and reason deductively, but we’re so inefficient at it that we need the heuristics as shortcuts sometimes. That’s where errors creep in.

I claim that there is no difference between a heuristic and an unsound axiom.

A heuristic has to have some degree of accuracy, though. There needs to be a distinction between heuristics and just any old unsound axiom.

@Harry: if we choose to define human abstract reasoning as the convex hull of the capabilities of our heuristics, then what are we asking exactly when we ask for the difference between SAT solvers and humans? Are we asking for a description of exactly what our heuristic programs are, how they work, and what kind of brain OS we have that facilitates their synchronized usage? I’m not saying this is a bad thing at all, and a move to define human abstract reasoning in terms of our evolved heuristics seems good to me. It just seems like we are saying “humans do something that SAT solvers categorically don’t do… but we cannot give a description of what that something is.” If we could give a description of it, then presumably we’d have already made SAT solvers always do that… and if we’re trying to describe the difference between current humans and current SAT solvers, I think this just reduces to the difficult task of human brain emulation in general. We don’t know how to write programs that go from no-hard-coded heuristics to human-level-hard-coded-heuristics. It’s definitely interesting to consider *why* we don’t know how to do that, but the mere fact that we don’t doesn’t seem to make human reasoning anything special.

My concern is that there are other definitions out there for what we might mean by abstraction besides just whatever it is that humans do. And some of these potential kinds of abstraction seem to be more powerful to me than an evolutionary bag of heuristics.

That’s what I’m saying, yes. Human brains possess heuristics that we don’t understand yet. We want to figure out what those are so we can incorporate them into our tools. If we also discover other kinds of heuristics or reasoning or thought, then sure, we should incorporate those too. But the human brain is a natural example to work off of, and I don’t know of anything that’s more productive as an object of study for AI.

(The SAT solver example is rather misleading, IMO, because a SAT solver is specifically designed to solve one problem, and anything that doesn’t help it do that isn’t worth putting in. There’s no reason for the added complexity if PHP isn’t common enough in actual instances to justify it.)

Maybe it could be useful to distinguish between HumanP and HumanityP? It seems to me that much of humanity’s intellectual productivity stems from the fact that different top-quality human reasoners have different bilndspots (kinds of instances that they make false judgements on) and different strong suits (kinds of instances they solve correctly and quickly), and we — the collective “we” — have reasonably good social mechanisms for sorting out who’s reliable about what and weighing people’s testimonies accordingly. To use a toy-exampe: Maybe the very heuristics that make Penrose so efficient at physics (in his particular way of being efficient at physics) make him produce errors at philosophy of AI, but luckily we can figure this out about him and learn to listen to him about physics but listen to other people about philosophy of AI.

(Of course “HumanityP” will be probabilistic all the way down, unlike HumanP as formalized by BobTB.)

In trying to be more specific about HumanP, and in considering that human heuristics generally have some fallibility to them, I have an idea for a candidate formal definition. I’m sorry there’s no TeX on WordPress (is there?), but hopefully the notation is good enough. It seems like when a human solves difficult problems efficiently, it is because there are quick ways to check candidate proofs (the problem starts out in NP or there is a very useful NP sub-problem that helps guide us in the right direction). So, maybe we can modify the definition of NP a little a get something useful out of it. For the sake of starting a conversation about potential deficiencies in trying to formally define HumanP, I propose the following def:

HumanP_eps = the class of problems for which a human possesses a polynomial-time algorithm that generates a finite list of candidate proofs W = { w_1, w_2, …, w_N } such that the empirical probability Pr( there exists a w in W s.t. there exists a Turing machine M, input x, s.t. M(x,w) accepts ) > 1 – eps.

The Turing machine M would be deterministic and able to run in polynomial time in the length of x, and by assumption the candidates w_i are polynomial in this length too, even if humans cannot give an explicit account of the eventual input x.

Some places where I could use help: (1) how to turn the statement about empirical problem classes that we see humans worth with into a problem about formal languages, or some conceptual problems that this might bring about. (2) what are the rules about how humans can generate the potential proofs (e.g. are these always computed by inspiration drawn from the physical world.. is this how evolution imbued us with such an ability?) (3) To what extent does probability play a role? I include the fudge factor epsilon, because we want our heuristics to be allowed to fail, so we only require humans to target a set containing a valid proof with some probability, and we can tune epsilon based on the difficulty of the problem, etc. (4) All the other inadequacies I am failing to see.

Another thought: it also seems that even when humans solve “easy” problems, we often do it inefficiently by virtue of this kind of finite-list-of-proof-generation mechanism, where we first equate it to some NP problem (easy to check problem) and the busy ourselves generating things we can check. For example, when I try to factor numbers (without being explicitly guided to use the faster known algorithms) I tend to make plausible guesses and then rule them out, then find the guesses that work (e.g. was it divisible by two?) to repeat this list-of-guesses heuristic. Not that factoring is an easy problem, but as we mentioned in class it seems to occupy this nebulous space between P and NP-complete. Maybe the same ability that makes us inexplicably fast at instances of hard problems also makes us inexplicably slow at easier problems?

“Maybe the same ability that makes us inexplicably fast at instances of hard problems also makes us inexplicably slow at easier problems?”

How about this: Humans are (somewhat) optimized for transforming novel problems into familiar problems. So humans do well when the (proof of a) solution to a novel problem has low conditional Kolmogorov complexity given the solution to a problem that the human already has an efficient way to solve. But because this is the thing humans are optimized, for, they don’t do well when this isn’t a viable option. (I would speculate that autistic savants are different from normal humans in exactly this way. They have no “creativity” to waste all their resources on, so they don’t make for great mathematicians but are super efficient on easy problems that other humans are slow at.)

I mean low K-complexity *and* low time complexity and space complexity.

@Peli Another way of putting your idea, which seems quite interesting, is that humans do well when a proof can be drawn from some heuristic distribution over possible proofs. When facing a new problem, we first try to fit it into all the templates we know and draw sample proofs from current distributions. So when the novel problem’s distribution over its solution-space is close to some currently held heuristic distribution, we do well. I think you could even quantify the closeness with KL Divergence, or essentially the Bayesian surprise of the novel problem type compared to our heuristics: (http://ilab.usc.edu/surprise/)

It does seem to me that humans solve problems by emulating a form of nondeterministic Turing machine / depth-first search, where the shape of the tree, the paths we explore, and when we backtrack are all the product of various heuristics. For example, “if I take the knight, the bishop will take me… all right, this doesn’t leave me in a good position, what if I move my pawn up instead?… wait, I forgot about that rook…” On certain classes of problems this works pretty well most of the time. But on easy problems where we would be better off being a deterministic Turing machine, our brains don’t like doing that. They want to skip ahead and jump around instead of being methodical, and we waste time and get stuck at local maxima and such. We do that for hard problems too, but everyone else is also wasting time and getting stuck, so it doesn’t matter as much.

The word ‘knowledge’ is used and abused in every day speech. In fact, when we ask ‘what is the oldest person you know?’ We don’t mean the same as when we ask ‘what is the largest prime you know’? In fact, when we talk about ‘knowing’ something in the real world we don’t expect for the knowledge to actually encompass the essence of the thing. But then, what makes mathematical objects ‘knowable’? Isn’t it true that we can only know the properties of mathematical objects?

As pointed out by Scott, it is not about ‘knowing something’ but more about ‘knowing that’ and even more ‘knowing how’ (probably, give a proof of the fact that you know). But at this point the paradox of the largest prime becomes even worse! Think about the two following questions:

-Do you know how to prove that p:=2^(43,112,609) – 1 is prime?

-Do you know how to prove that q := ‘the first prime larger than p’ is prime?

Clearly we know the primality of q, and yet I like to think we do not know q. The reason this happens is that there is no essence of q (or at least that is not what we are interested in). When we use the word ‘know’ we actually imply some property about it (e.g. the decimal expansion, how to compute it, how to compute it efficiently, it’s existence…). For example, we probably say someone ‘knows’ the number 6 if they know it’s the successor of 5. We don’t ask anyone to know that 6 is a perfect number, or the number of strings in a guitar. But this doesn’t tell us anything about ‘how’ we know things! For example when we say ‘what is the largest prime you know’’ we probably ask about (say) the decimal expansion (and maybe a proof of primality). Yet 2^(43,112,609) – 1 is a perfectly good substitute for the decimal expansion. So, why is it enough to know a reduced description of p? If we claim that both descriptions of p are equivalent, and knowing one is as good as knowing the other one, we would have the problem of logical omniscience.

To avoid the problem of logical omniscience, we could consider to construct knowledge backwards instead of forward and use complexity. That is, I know the answer of a question if I can compute it efficiently from all true sentences that I “know” (in the more strict sense of ‘knowing’). This definition seems reasonable because it solves the logical omniscience problem and also accounts for the idea of acquiring knowledge when sitting at home (mainly because it increases the number of true sentences known, which makes other computations easier). I specially like the idea of bringing computational complexity to the table because computing efficiently seems to be tightly related to the idea of understanding (or having insight) which tells us more about ‘how’ we know and not only what we store in our brains. Nevertheless, for the moment, I don’t have a reasonable definition of what efficient means in this context and would appreciate suggestions. Obviously it cannot be stated in terms of the size of the input, because some hard questions are easily stated and vice-versa. So there must be a way to account for the fact that we know the decimal expansion of (2^(43,112,609) – 1), even though it will take you very long, and at the same time account for the fact that we don’t know how to prove (or disprove) the Riemann hypothesis even though it’s doable in finite time.

A candidate I though about, is to compare how long the proof takes with the amount of knowledge you acquire on the process. Deciding the Riemann hypothesis will probably also solve an enormous number of questions while expanding (2^(43,112,609) – 1) will not make you much smarter. I can see some flaws in this definition, but I though it was worth mentioning. I would also like to hear other people’s definitions, if there is any.

I think this ideas captures an interesting point: what happens during a computation (or during a proof process) seems to matter in the case of human knowledge. Along with your definition, we might also want to consider some kind of metric that measures how relevant an algorithm is to the intended goal. If we could draw a “relevance vs time” curve for an algorithm, then two algorithms competing to produce similar outputs could possibly be scored according to which is “on task” most often. If algorithm A spends more of its time being on task than algorithm B we might regard A as having more knowledge than B even if B’s algorithm is more efficient.

What inspired me to think of this are the classic finance interviews where you are asked things like “how many gallons are in the ocean?” or “how many gas stations are in America?” and the interviewer just wants to see if you can make an efficient and plausible chain of inference and that you are aware of all the assumptions you make that could later be revisited. The accuracy of the final answer is not as important as the relevance and precision of the reasoning process. Presumably, this is like a Turing test checking for a kind of intelligence or aptitude.

On the flip side, Penrose talks a bit at the end of his book about mathematical epiphanies and it does seem like great mathematicians seem to just “pull something from thin air” from time to time. In popular culture we regard such epiphany as a kind of genius (think Dr. House) but behind the scenes I feel it is less common. Scientists tend to be respected for a coherent direction of research rather than isolated singleton ideas. That seems to suggest that your idea to also focus on ‘what is learned in the process’ is definitely applicable in practice.

I believe the complicating factor in discussing the “HumanP” complexity class is the difficulty in classifying problems into those that can be solved by an average human in polynomial time and those that can’t – what can an average human do, anyway?

Traditional complexity classes are fairly easy to explain – if we can write an algorithm that solves all instances of a problem in polynomial time, then the problem is in P. The algorithm’s asymptotic time complexity doesn’t depend on the sort of computer we decide to run the algorithm on, or what language we choose to implement the algorithm – these choices are often wrapped up in the constant factor that we end up seeing. And, of course, all algorithms can theoretically be expressed in terms of the allowable movements of a Turing machine – our head can write a symbol, erase a symbol, move left, or move right, which is a pretty limited range of movements.

Back to the topic of HumanP – we may want to instead discuss what would constitute the basic set of functionalities of a human. Given nothing, perhaps a human can only remember a few symbols, but given paper, a human may be able to remember arbitrary symbols. A human may also be able to efficiently compare two numbers, to see if one is greater than the other. In this manner, we can build up a set of allowable building blocks so that we can actually discuss algorithms being in HumanP.

I think this might be a fun thought experiment to carry out – much as computational complexity classes have enlightened about some of the fundamental capabilities and limitations of computers, maybe discussing human complexity classes may enlighten us about our own capabilities, strengths, and weaknesses as well.

We talked a lot in class (and now a lot in the blog comments) about P vs. HumanP. The definition given in class was, more or less, “HumanP is the class of all problems that humans can feasibly/efficiently solve”. Since then, there have been some pretty interesting comment threads (see above) about formalizing it in terms of polynomial-time algorithms, etc., like we tend to do.

One thing I’ve been wondering about, though, is how to define what humans can do in polynomial time. With “normal” algorithms (run on Turing Machines), it’s very clear how to define polynomial time because it’s clear what the individual steps of an algorithm are (and so we can easily count them and see whether the number of steps is polynomial in the input length).

Are these “steps” the same for humans, or can we perform certain tasks in one step that computers need multiple steps to do, i.e., do we have oracles to certain functions? Since there seems to be a consensus that HumanP contains more problems than P (for instance, the ability to recognize great art or music, the ability to recognize certain chess strategies), I’d lean towards the oracle case. So I’d propose a definition of HumanP as being problems that humans can solve in polynomial-time with access to particular oracles; the question of determining what “meta” principles humans have is then the question of determining what questions the oracles can answer.

It’s not clear that these oracles would be akin to the ones we’re familiar with. Even if they answer the questions like “is x in A?”, A might be completely absurd, or impossible to define. As a result, I don’t claim to have advanced our knowledge of HumanP with this comment :). It just might be another way to frame the question.

Hi Katrina, just a quick (but important) correction:

Since there seems to be a consensus that HumanP contains more problems than P (for instance, the ability to recognize great art or music, the ability to recognize certain chess strategies)No, there’s no such consensus. What I think is clear is that there are problems in “HumanP” (however you choose to define it) for which

we don’t currently know algorithms that would put them in P.But someday we might discover those algorithms (andifyou believe in the computational theory of mind, then presumably those algorithms have to exist, whether we’ve discovered them or not).Ah, good point — consider me corrected!

I see the difference in whether p and q are “known” as simply a matter of the manner in which they are expressed. p is expressed in a more mathematical notation, whereas q is expressed without regard to its numerical value. While it’s true that q is precisely defined and is a specific number, p is expressed as a mathematical calculation whereas q is not. “If you were to perform this calculation, you would get the value of p in decimal notation.”

If we alter q’s definition slightly, we can come up with an algorithm for finding its decimal representation (“test each number greater than p in succession for primality; q is the first one that passes the test”). However, this algorithm is qualitatively different from that used to find p, for several interrelated reasons:

-The calculation used to find p’s decimal representation only finds p, whereas the computation used to find q calculates many numbers, and must test whether each of the numbers is q or not.

-The operations used to calculate p involve “basic” mathematical operations. Depending on how basic we go, we can start with incrementation, define addition in terms of a fixed finite number of incrementations, multiplication as a fixed finite number of additions, and exponentiation as a fixed finite number of multiplications. (If further definitions of prime numbers involve other notation such as up-arrow notation, those can be similarly defined.) By contrast, the (algorithmic) definition of q is qualitatively different; it involves a primality test and then an if/then branch based on the result of that test.

-As a result, if we know exactly what algorithms we are using for exponentiation etc., we can calculate the exact number of steps a Turing machine would take to output the decimal representation of p. However, we don’t know the number of steps a Turing machine would take to output q, because we don’t know how many numbers the machine would test in order to find q.

There are two matters that arise when trying to define knowledge:

What is the nature of knowledge? What does it mean to know something? Where does the term knowledge seem to arise?

I claim that the world knowledge seems to arise from the apparent ability we have to “understand the world that surrounds us”. It then has also derived in ordinary speech to refer to any entity of which we can talk about being “understood”, such as mathematics.

An observation I believe is important to note when talking about knowledge is that we tend to assume that we humans use defined inference rules to “understand” the world. This is not necessarily the case. And even if it is, the question comes to mind if what we know are “knowledge bits” and rules of inference on knowledge bits, then, do we already know all the implications of all these rules acting on this knowledge bits, or not.

Furthermore, if we ask about the origin of knowledge we find the following issues:

There seems to be knowledge that is inferred from previous knowledge:

To know A we must know B

B → A

Then how could we know B in the first place?

Empirical knowledge is obtained because we know something else. When is then that we can know anything? Is knowledge an infinite spiral down? Could there be cycles? It is interesting to think about this because from a complexity theory perspective, it seems that the only possibility is for the chain to be finite, unless we were to accept the existence of an infinite amount of information and computation “existing within the fibers of reality”.

If we focus on the question of where does not inferential knowledge come from, then it seems we are asking if we are ever able to really understand reality. I claim that the answer to this question is one of coming up with models. Since it seems we do not have direct access to reality (debatable) we come up with models of it. Our knowledge is made up of models of the real. In such a framing, we can then ask about the amount of knowledge object o produces if we ask for the most synthetic algorithm that can produce an “acceptable model” for an object. There is a question on the question of what does “understanding” mean. Can we actually understand something? If understanding is modeling? Wouldn’t knowledge be then be a model? Wouldn’t then an acceptable measure of knowledge be the complexity of “reaching” a model, whatever it might be?

Today’s lecture reminded me of a quotation by Banach in a letter to Ulam: “Good mathematicians see analogies between theorems. Great mathematicians see analogies between analogies.”

Considering the argument that “HumanP” is meaningless because we can’t give human beings arbitrarily-long inputs in contrast to idealized computers, I would argue in support of the perspective that the asymptotic limit is not what we are “really” concerned about in the first place. Furthermore, I would go on to assert that human society is more concerned with capability than efficiency – something that is rooted in the fact that human beings do not really think about handling arbitrarily-long inputs. In human society, progress has been classically felt in terms of what can someone do in terms of achieving some function for himself and the people/environment around him. This is capability, not efficiency. As the world grows more and more humans do want to accomplish tasks at larger scale and more efficiently – incidentally a time when machines are on the rise. However, I would argue that except for certain fields of knowledge, we try to just figure out ways to do things – i.e. it is more about the computability of specific instances of decision problems than time-complexity.

Perhaps “HumanP” comes up in discussions as a result of the way we tend to look at the functionality of computers nowadays. Computers are thought of less like oracles and more like slaves. Unfortunately, slaves are thought of as just plain workers… and the more efficiently they are put to work the better. However, when one considers the intelligence and power of the human mind, he or she does not speak of how better people work faster [even though that is implied at times], but rather speaks of how better people [in terms of skill or intelligence] are able to behave like better oracles – i.e. be able to give answers to more questions [interestingly similar to the foundations of the Turing test]. What do others think of this perspective on the problem? Is it just because to us, computers are less like oracles and more like slaves?

When we consider the complexity class “HumanP” we have to think carefully

about what basic operation we are measuring. To sharpen this point, I’d

like to consider the case of factoring an integer r = p x q, where

p and q are primes.

It seems pretty clear that humans are not very good at factoring r, at

least symbolically. But perhaps we could think about recasting

factoring as a problem that humans actually are good at: namely,

something we could call “color gradient completion”. To illustrate this,

I made a simple matrix R, whose element r_{i,j} is the product of the

ith prime with the jth prime. The diagonal of this matrix is

4, 9, 25, … Here is a visualization of this matrix R for the 1229

primes less than 10,000:

I propose that we think of factoring r as the task of matching a given

color to a location in the matrix visualization. That is, we

show the human subject a blank white matrix and also show them the color

of r as it was rendered in fig1 above. Instead of computing the

matrix as in fig1, we don’t compute anything other than the least and

greatest elements of the diagonal (to set the bounds of the color gradient)

ahead of time and instead give the human an interactive interface that

allows her to click on an element of the blank matrix. In response to a

click we actually compute the product of the two primes corresponding to

the location she selected and color the location accordingly.

I propose that humans could very quickly home in on the region of the matrix

containing the color of r. When the human locates the exact color by

clicking, she need only read off the i,j coordinates of the point to determine

that r is the product of the ith prime with the jth prime. Note that in

this interface the product of two primes is only computed in response to

a mouse click, so the number of multiplications is strictly bounded by

the number of mouse clicks.

The progress of the human might look something like this (note that this

would require a large monitor with accurate color reproduction):

Guessing the matrix color gradient might require an exponential number of

operations in the human’s visual cortex, but might require only a polynomial

number of mouse clicks. This “color gradient completion” game is not likely to

seriously challenge modern factorization methods, but hopefully it forces

us to consider what exactly we mean by the basic operation of the “HumanP”

complexity class.

This is extremely interesting. One could imagine drawing a similar “structure plot” for the game of chess, albeit much higher dimensional. In fact, giving a precise account of why a human chooses which points to click and in which order would directly yield a sampling procedure for factoring a number using Gibbs sampling (the second picture you have drawn is essentially human Gibbs sampling over the space of possible two-prime factors.. in fact, because all the motions are orthogonal it is exactly Gibbs sampling).

I think that the human ability to solve these kids of problems heuristically quickly must involve their ability to be expressed in a structural way (i.e. your first gradient plot) *AND* whatever that structural description is, it must be amenable to the things our brains evolved to be good at already. In a sense, certain mathematical problems are just “problems of opportunity” for the kinds of thinking structures we have evolved.

I don’t know if sampling from a distribution can give a fully adequate account of knowledge and human efficiency — my belief is that by using stochastic set theory it can and that this is likely the best way to understand cognitive physics — but I don’t have a lot of evidence for this. In particular, the closer one gets to logical knowledge, the harder it is to see that sampling from a distribution accounts for ‘knowing’ something. But I do think that almost all practical examples of human problem solving are better phrased as sampling from a distribution, and then measuring efficiency really depends on what prior one starts with. If someone started out knowing your Plot #1, the gradient structure of prime multiplication, then factor would be much simpler to solve with Gibbs sampling than by following a particular algorithm.

When we seem humans ‘magically’ cutting through whole swaths of a problem domain and homing in on a good solution, it can seem really amazing. I am biased against this, and whenever humans seem to have amazing thinking capabilities, I tend to believe we are just not looking at the right problem domain yet.

Sipser points out in his paper a great point which I strongly agree with: that Church’s and Turing’s proofs depend on a contrived phenomenon of self-reference that “never appears spontaneously”, or at least, in my opinion, there is not enough evidence that this phenomenon is common at all, or is relevant enough to affect our reasining about the intractability of the instances of the Entscheidungsproblem that interest us. We can see this dependence on self-reference in arguments made by people other than Church and Turing, and I do not question the logic derived from it, but the relevance (Godel’s incompleteness theorem being one of them).

A quick note on using the pigeon hole principle to attribute pattern-recognition to humans: why should we assume that a machine can only solve this problem in SAT? A human solving it in SAT will have as hard of a time. It seems to make sense that once humans found out they can’t fit two pigeons into one hole, three in two, four in three, etc. they used induction to deduce they can’t fit n+1 piegons in n holes. This can be proved by induction (inductive step is adding 1 to the number of pigeons and number of holes, the invariant is that the number of holes is always less than the number of pigeons, the predicate is that if the invariant holds, then you can’t fit the pigeons in the holes). A computer can use this proof to solve questions pertaining to the PHP. But can the computer come up with the proof? A question for another day – I feel this is out of scope.

On the problem of logical Omniscience, WPSCACC 5.1, when considering the question “whether 43 x 37 = 1591”, is that the same as asking “what are the prime factors of 1591”? I believe the answer is no. The question is not being asked in a different way, it is a completely different question. Each question serves a different purpose – the first to test whether the questionee can multiply 43 and 37 (or how to factor 1591), the second to test whether theq questionee knows how to factor 1591. Hence, I disagree that the content of both answers is the same, unless we claim that a human will always try to factor 1591 rather than multiply 43 and 37 to answer the first question, then the two answers have different content: one of them implies knowledge of multiplying (I claim humans will always use multiplication to solve the first problem) and the second implies a knowledge of factoring. If you can prove those two contents to be equivalent (in computational complexity), then you can prove factoring to be n P. I can make a similar argument about asking a four-year-old about the comutative property of arithmetic – her answer to the questions asked have different content (one content might be a subset of the other, nonetheless they are not subsets of each other). As pointed out in section 5.2 – I largely agree with the fact that we are making progress in pushing the problem somewhere else, or reposing the question, because asking the right question about the problem is a first step to solving it, and certainly the “Infinity cure” makes some progress in clarifying what we need to tackle to solve the problem of logical omniscience.

At the risk of sounding pedestrian I’d like to concede Scott’s uncontroversial point that AI has not, as yet, achieved human-level performance in all arenas :). Instead, I’d like to answer

> How would a SAT-solver need to be designed so that it quickly determined the unsatisfiability of a propositional formula representing PHPn (the Pigeonhole Principle with n pigeons)?

There are a range of reasonable directions in which statistical inference could be leveraged to this end. In an empirical risk minimisation setting, one could define a family of SAT solvers and a loss function evaluated on a set of “training sentences”. The learning problem would be used to identify a minimum-loss solver from the among the hypothesis class. If the training set included instances in which the pigeon hole principle was useful (leads to efficient identification of proofs), and the hypothesis class was sufficiently expressive (could encode the heuristic “try the pigeon hole principle at some point”), and the learning algorithm sufficiently powerful (to find an element of the hypothesis class encoding the aforementioned heuristic), then this learning algorithm would both discover and use the pigeon hole principle. There are many practical issues here but no conceptual problems.

Now that I think about it, why doesn’t the SAT community do more with statistical learning?