I will mainly talk about our recent paper on deep learning: Provable bound for learning some deep representations. Deep learning, a modern variant of artificial neural net, has been a very powerful tool in recent years for many machine learning tasks such as image recognition, speech recognition and natural language processing. However, theoretically, it still remains mysterious why these modern heuristics and tricks together with millions of data result in big improvements for so many important questions. With the motivation to resolve these mysteries, this paper proposes an efficient algorithm that provably learns a class of deep neural networks under a generative model.
In more detail, the network we study has constant number of layers, each consecutive two layers are connected via a random degree-d bipartite graph with random weights in [-1,1]. The observed data at bottom is generated as follows: First a sparse hidden vector at the top layer is chosen, then the top layer activates some of the second layer nodes by multiplying the connection matrix and applying another point-wise threshold function, and then adding some small noise. Then the second layer activates the third using the same rules with the new connection matrix, so on and so forth. Our algorithms use layer-wise learning and for each layer we exploits the old Hebbian rule: "Things fire together wire together", that is, if two nodes in the same layer are correlated, then they are likely to share a neighbor in the layer above them. After observing the common neighbor information, we could apply a graph recovery procedure and get the support of the network, from which we could get the hidden variables and the weights. The running time is polynomial and the sample complexity is quadratic or cubic in the number of nodes, depending on the details of the model.
The talk will be self contained and this is the joint work with Sanjeev Arora, Aditya Bhaskara and Rong Ge.