Consider a Rubik’s cube. If someone rotates one side of the cube and hands the cube to you, you can see how to return the cube to its original form.
However, if someone rotates several sides one after another, returning the cube to its original form becomes a complicated logical puzzle, and solving it might require advice from a mathematician.
In the same way, consider a tangled rope. If it loops over itself only once or twice, we can see how to untangle it. However, if the rope is tangled in a more complicated way, untangling it becomes very difficult.
It is a surprising mismatch: our species has evolved to live in a three-dimensional world, and we have an excellent understanding of space, but we are amazingly bad at untangling. This is why in human culture knotting and tangling has become a standard metaphor for any complicated problem.
Now think about computers. Computers can untangle much better than humans, in the sense that mathematicians and computer scientists have developed several excellent computer algorithms specifically for untangling tangled ropes. However, can’t computers “just look” at a knot and untangle it, with only a minimal input from a mathematician?
At the moment, the most successful technology for computer vision is called deep neural networks. A deep neural network consists of several pictures one behind another; in each following picture, details from the previous picture are accentuated, so that the computer can make this or that decision regarding the picture.
For example, if the task given to the computer is to find out whether there is a cat in the picture, the computer will accentuate details like cat ears or cat eyes. In our project, the computer will learn to accentuate the parts of the rope that should be pulled to untangle the rope.
This will be an important step towards teaching computers to understand our three-dimensional space, and it can have useful applications in the future.
This three year project has been funded by the Leverhulme Trust.