Example projects from previous classes

History-dependent Adaptive SGD Methods under Asynchronized Distributed Setting: The history dependent adaptive learning rate SGD methods are analyzed and implemented under asynchronized distributed setting. Given the delayed updates, AdaGrad and AdaDelta are reformualted accordingly, and a theoretical proof establies both the O(1/sqrt(T)) convergence rate and the sub-linear regret bound of AdaGrad. Furtuer, proposed methods are tested on MNIST digit classification using Softmax regression and Feedforward Neural Network, and sequence-to-sequence regression using Recurrent Neural Network. Experiments show the general effectiveness of different methods, while AdaGrad is empirically more stable and robust for complex problem.

Epidemics Control: Distributed ADMM for Optimal Resource Allocation in Complex Networks: Epidemic modeling is a fundamental problem in network science. In this work, we solve an optimal resource allocation problem for controlling an epidemic outbreak using the SIS epidemic model. Semidefinite and geometric programming approaches are considered for undirected and directed graphs respectively for determining the minimum cost of intervention that causes an exponential decay in the outbreak. Our semidefinite programming implementation confirms prior results that immunization cost is strongly correlated with node degree, suggesting that higher degree nodes should be targeted for vaccine distribution first. A distributed ADMM approach is considered for efficiently solving the GP formulation by leveraging local consensus to arrive at a global optimal solution.

Maximum Margin Clustering: Direct Optimization with Proximal Alternating Linearized Minimization: Maximum margin clustering is an extension of support vector machine for unsupervised clustering. It separates unlabeled data into two approximately equally sized groups by finding a hyperplane with the largest margin. Although many existing works has shown promising results in computer vision applications, there is no convergence guarantee on the methods due to their non-convex formulations. In this project, we solve this problem directly in non-convex formulation with an algorithm called proximal alternating linearized minimization, which has been proven to have global convergence to stationary points for some non-convex and non-smooth problems. We propose an algorithm to solve the maximum margin clustering problem and we prove that our algorithm enjoys global convergence. Experiment results show that our proposed algorithm runs faster than recent state of the art implementation of maximum margin clustering.

Convex Minimum Cost Flow: from Linear Programming to Accelerated Dual Descent Method: For the linear programming problem of minimum cost flow where the cost for each network edge is a constant, we explored and implemented two classical algorithms, negative cycle canceling algorithm and successive shortest path algorithm. To compare the time efficiency of these two algorithms, we tested the performance on different randomly generated graphs. For the more generalized minimum convex cost flow problem which cannot be solved by classical algorithms, we studied the accelerated dual descent (ADD) algorithm proposed by Zargham et al., which is based on approximating the Hessian inverse using matrix splitting techniques. We further implemented this algorithm and applied it to solve a physical problem of resistor network with quadratic costs. The results from ADD algorithm agree with brute force calculation on small testing networks, but it is much more efficient and applicable to larger resistor networks where physics based brute force calculation is almost impossible.

Total-Variation Based Optimization for Tomography:Tomography and gas reconstruction problem aim at recovering the density field of an object given measurements from different angles, which is often an integration of texture properties along the light/x-ray path. Since most of the objects have continuous density in nature, we formulate the problem into a total-variance based regression problem, which is related to a generalized lasso problem. However, our problem is more difficult because there are additional constraints and the matrix in the regression and the penalty terms are often rank deficient. In this project, we solve the problem by (1) adding L-2 regularization term and (2) developing method that avoids matrix inversion operator. Also, since it is not a one to one mapping from dual solution to primal solution, we use a path following algorithm to keep solution on both side feasible. We apply several solvers to synthetic data and discuss the convergence rate and performance of each method.

Fast Hinge-Loss Markov Random Field Inference: Probabilistic Soft Logic (PSL) is a highly flexible formulation that, like Markov Logic Networks, brings together probabilistic graphical models and predicate logic to allow collectively reasoning about random variables connected by logical rules. PSL relaxes Markov Logic Networks by using soft truth values (continuous values in the interval [0,1]), in such a way that finding the most probable configuration becomes a convex optimization problem. Although the convexity of the problem guarantees convergence to the global minimum, convergence can still be slow for the enormous logical graphs that can be tackled with PSL. The current state-of-the-art method for solving the PSL optimization problem is a consensus optimization method that is both fast and distributed and is based on the alternating directions method of multipliers (ADMM) framework. In this project we derive and implement that consensus optimization method, and we further extend it by introducing to it some stochastic optimization ideas, and demonstrate improved empirical performance.

A Solver for Sparse KKT Matrices: Many solutions to constrained optimization problems require finding the solution to a linear system Ax = b. It is commonly the case that solving a linear system is the most costly step in optimizing the objective function and consequently, much work has been done in speeding up various solvers by exploiting various properties or structures within the matrix. For example, one can use the LU or QR decomposition to solve general linear systems of the form Ax = b. However, if the matrix A satisfies certain properties, then often the computation can be sped up by using a different matrix factorization. For example, for positive definite matrices there are solvers using the Cholesky decomposition, or the for more general symmetric matrices there are solvers using the LDLT decomposition. Furthermore, more recent work has focused on optimizing solvers for Ax = b where A is a sparse matrix, resulting in sparse versions of the aforementioned solvers.

Hessian-free Second-order Optimization on Deep Neural Network: Stochastic Gradient Descent (SGD) is the most commonly employed method in deep neural network (DNN) training. However, SGD suffers the issues of lower layer gradient vanish and local curvature. Second-order methods could use local curvature information to scale the gradient update, but it is extremely expensive to calculate and store Hessian for DNN. Hessian-free method could solve the second order update step without estimating Hessian explicitly. In this project, we aim at implementing Hessian-free training for DNN and comparing it with SGD on classification task.