Firstly I'm not necessarily looking for a complete algorithm I can just copy and paste, then call it a day. Any "general approach" solutions would be fine for me!
This entire post was spurred by a slow day at work, and stumbling upon this site and not being able to figure out how they implemented their generator.
The Problem
For those of you who don't know, the "Zebra Puzzle" or "Einstein's Puzzle" is a famous logic puzzle that you've probably ran into before.
The full wiki article is here, but I'll post the relevent bits.
There are five houses.
The Englishman lives in the red house.
The Spaniard owns the dog.
Coffee is drunk in the green house.
The Ukrainian drinks tea.
The green house is immediately to the right of the ivory house.
The Old Gold smoker owns snails.
Kools are smoked in the yellow house.
Milk is drunk in the middle house.
The Norwegian lives in the first house.
The man who smokes Chesterfields lives in the house next to the man with the fox.
Kools are smoked in the house next to the house where the horse is kept.
The Lucky Strike smoker drinks orange juice.
The Japanese smokes Parliaments.
The Norwegian lives next to the blue house.
Now, who drinks water? Who owns the zebra? In the interest of clarity, it must be
added that each of the five houses is painted a different color, and their inhabitants
are of different national extractions, own different pets, drink different beverages
and smoke different brands of American cigarets [sic]. One other thing: in statement
6, right means your right.
This is all well and good. I've found several concise and neat ways online to solve this problem, especially using constraint programming. However, what interests me is making more of these types of puzzles.
Making More
Obviously, a matrix representation is a logical way to think about this. With each column containing a person, house, what they drink, what type of car they drive, etc.
My initial thought was to start with a randomly generated grid that is complete (ie, solved) then (somehow) create hints from the solved version that uniquely identify it. Every time something can be determined, it's removed from the grid.
Ripping off the site I listed at the beginning, the following "hints" that can be used to solve the grid can be of the following type:
The person/animal/plant lives/grows in a given house.
The person/animal/plant does not live/grow in a given house.
The person/animal/plant lives in the same house as the other person/animal/plant.
The person/animal/plant is a direct neighbor of the other person/animal/plant.
The person/animal/plant is the left or right neighbor of other person/animal/plant.
There is one house between the person/animal/plant and the other person/animal/plant.
There is one house between the person/animal/plan and the other person/animal/plant on the left or right.
There are two houses between the person/animal/plant and the other person/animal/plant.
There are two houses between the person/animal/plan and the other person/animal/plant on the left or right.
The person/animal/plant lives left or right from the other person/animal/plant.
You can see how these could be generalized, extended, etc;
The difficulty is, using my approach (starting with a complete grid and generating these hints), I'm not sure how to make sure the set of hints I create would absolutely result in the target grid.
For example, if you say "The Englishman does not own a pine tree" you cannot decisively pair two things together at any given time in the puzzle. Yet if there were only two trees remaining to solve for, this could in fact be a decisive piece of evidence.
Am I thinking about this the entirely wrong way? Would a better approach be to create a grid with some randomized, pre-defined known elements (ie, the red house is in the middle) and then build up the grid using these hints as rules for building?
Any advice, articles to read, programming techniques to learn about, etc. would be greatly appreciated!