2021 Ken Kennedy AI and Data Science Conference: Full Schedule

10:00am CDT

Welcome

Wednesday October 27, 2021 10:00am - 10:05am CDT
Auditorium

10:05am CDT

Automatic Machine Learning with AutoGluon - Algorithms, Domains, Applications - Auditorium

AutoML is the ultimate challenge for machine learning algorithms. After all, design choices need to be automatic and tools need to work reliably all the time, within a given budget for computation and time. This poses exciting and (many unsolved) problems both in terms of model selection, calibration, optimization, adaptive design of priors, and data detection. In this talk I give an overview of the associated scientific problems and the current state of the art in terms of what goes into AutoGluon.

Speakers

Alex Smola

VP and Distinguished Scientist, Amazon Web Services

Alex Smola studied physics in Munich at the University of Technology, Munich and at AT&T Research in Holmdel. He received a Doctoral Degree in computer science at the University of Technology Berlin in 1998. He worked at the Fraunhofer Geselschaft (1996-1999), NICTA (2004-2008... Read More →

Wednesday October 27, 2021 10:05am - 10:50am CDT
Auditorium

Keynote

10:50am CDT

Scalable and Sustainable AI Acceleration for Everyone: Hashing Algorithms Train Billion-parameter AI Models on a Commodity CPU faster than Hardware Accelerators - Auditorium

Current Deep Learning (DL) architectures are growing larger to learn from complex datasets. Training and tuning astronomical-sized models are time and energy-consuming and stalls the progress in AI. Industries are increasingly investing in specialized hardware and deep learning accelerators like TPUs and GPUs to scale up the process. It is taken for granted that commodity hardware CPU is incapable of outperforming powerful accelerators such as GPUs in a head-to-head comparison of training large DL models. However, GPUs come with additional concerns: expensive infrastructural change which only few can afford, hard to virtualize, main memory limitations, chip shortage. Furthermore, the energy consumption of current AI training is prohibitively expensive. An article from MIT Technology Review noted that training one Deep Learning model generates more carbon footprint than five cars in their lifetime.

In this talk, I will demonstrate the first algorithmic progress that exponentially reduces the computation cost associated with training neural networks by mimicking the brain's sparsity. We will show how data structures, particularly hash tables, can be used to design an efficient "associative memory" that reduces the number of multiplications associated with the training of the neural networks. Implementation of this algorithm challenges the common knowledge prevailing in the community that specialized processors like GPUs are significantly superior to CPUs for training large neural networks. The resulting algorithm is orders of magnitude cheaper and energy-efficient. Our careful implementations can train Billions of parameter recommendation models on refurbished old generation CPU significantly faster than top-of-the-line TensorFlow alternatives on the most potent A100 GPU clusters. In the end, I will discuss the current and future state of this line of work along with a brief discussion on the planned extensions.

Speakers

Anshumali Shrivastava

Professor, Rice University; Founder, ThirdAI Corp

Anshumali Shrivastava's research focuses on Large Scale Machine Learning, Scalable and Sustainable Deep Learning, Randomized Algorithms for Big-Data and Graph Mining.

Wednesday October 27, 2021 10:50am - 11:35am CDT
Auditorium

Keynote

12:30pm CDT

Democratizing Deep Learning with Commodity Hardware: How to Train Large Deep Learning Models on CPU Efficiently with Sparsity - Auditorium

GPUs are expensive, require premium infrastructure, and are hard to virtualize. Furthermore, our models and data are growing faster than GPU memory. The communication cost of distributing the models over GPUs is prohibitively expensive for most workloads.

Wouldn't it be nice if we could train extensive models with commodity CPUs faster than GPUs? CPUs are cheap, well understood, and ubiquitous hardware. The main memory in CPUs can quickly run in Terabytes (TB) with minimum investment. For extensive models, we can fit both the model and the data in the CPU RAM.

This tutorial will focus on a new emerging paradigm of deep learning training using sparsity and hash tables. We will introduce the idea of selectively identifying parameters and sparsity patterns during exercise. We will demonstrate the integration of these algorithms in existing python codes. As a result, we demonstrate significantly superior deep learning capabilities on CPU, making them competitive (or even better) than state-of-the-art packages on some of the best GPUs. If time permits, we will briefly discuss multi-node implementation and some thoughts on how to train outrageously (Tens of billions or more) large models on small commodity clusters.

Speakers

Anshumali Shrivastava

Professor, Rice University; Founder, ThirdAI Corp

Anshumali Shrivastava's research focuses on Large Scale Machine Learning, Scalable and Sustainable Deep Learning, Randomized Algorithms for Big-Data and Graph Mining.

Tharun Medini

ThirdAI

Nicholas Meisburger

Rice University

Shabnam Daghaghi

Rice University

Minghao Yan

RIce University

Wednesday October 27, 2021 12:30pm - 2:30pm CDT
Auditorium

Workshop

3:00pm CDT

SeqScreen: Accurate and Sensitive Functional Screening of Pathogenic Sequences via Ensemble Learning - Auditorium

Modern benchtop DNA synthesis techniques and increased concern of emerging pathogens have elevated the importance of screening oligonucleotides for pathogens of concern. However, accurate and sensitive characterization of oligonucleotides is an open challenge for many of the current techniques and ontology-based tools. To address this gap, we have developed a novel software tool, SeqScreen, that can accurately and sensitively characterize short DNA sequences using a set of curated Functions of Sequences of Concern (FunSoCs), novel functional labels specific to microbial pathogenesis which describe the pathogenic potential of individual proteins. SeqScreen uses ensemble machine learning models encompassing multi-stage Neural Networks and Support Vector Classifiers which can label query sequences with FunSoCs via an imbalanced multi-class and multi-label classification task with high accuracy. In summary, SeqScreen represents a first step towards a novel paradigm of functionally informed pathogen characterization from genomic and metagenomic datasets. SeqScreen is open-source and freely available for download at: www.gitlab.com/treangenlab/seqscreen

Authors: Advait Balaji, Bryce Kille, Anthony Kappell, Gene Godbold, Madeline Diep, R. A Leo Elworth, Zhiqin Qian, Dreycey Albin, Daniel Nasko, Nidhi Shah, Mihai Pop, Santiago Segarra, Krista Ternus, and Todd Treangen

Speakers

Todd Treangen

Rice University

Wednesday October 27, 2021 3:00pm - 3:15pm CDT
Auditorium

Technical Presentation

3:15pm CDT

Parallel RRT Algorithm for Robotic Motion Planning

The advent of autonomous technology ranging from self-driving cars to robotic surgery has propelled motion planning algorithms to the forefront of research. The Rapidly-exploring Random Tree (RRT) algorithm is one such example that is used by robots to find a suitable path between two points while avoiding obstacles. It does this by building a search tree rooted at the start point and then grows the tree by randomly generating and connecting nodes in the search space. It then verifies each connection to ensure no collision has taken place. The algorithm terminates when the goal region is searched and returns a valid path through the tree.

Traditionally, RRT is designed to run sequentially on a single thread. Increasing the speed and efficiency of the algorithm would facilitate its use in highly complex realistic scenarios. With the advent of powerful computing machines, it is an opportune time to enhance the performance of these algorithms. This paper presents a novel parallel-RRT motion planning algorithm that performs computationally intensive steps in batches simultaneously on multiple threads. This increases the number of nodes created and collision checked per second hence finding paths faster.

To test the novel algorithm, we recorded the time taken for a car in a two dimensional space to navigate from a start to a goal point while avoiding obstacles in unknown environments. Results proved that the algorithm successfully utilized the additional threads to calculate paths quicker and more efficiently. In terms of speed, the algorithm showed a 2x speedup when using 2 threads and a 2.35x speedup when using 3 threads. In terms of efficiency, which was reflected by the number of connections added to the search tree per second, the algorithm showed a 2.25x increase in efficiency using 2 threads and a 3x increase using 3 threads.

These preliminary results show promise for leveraging parallel implementations of motion planning algorithms. The use of novel parallel algorithms such as that utilized in this paper heralds the progression into a new era of motion planning capabilities and would invigorate current development efforts in robotics and automation.

Authors: Mantej Singh, Rahul Shome, and Lydia Kavraki

Speakers

Mantej Singh

Rice University

Wednesday October 27, 2021 3:15pm - 3:30pm CDT
Auditorium

Technical Presentation

3:30pm CDT

MaGNET: Uniform Sampling from Deep Generative Network Manifolds without Retraining - Auditorium

Deep Generative Networks (DGNs) are extensively employed in Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and their variants to approximate the manifold structure and the distribution of a training dataset. However, the samples from the data manifold used to train a DGN are often obtained based on preferences, costs, or convenience such that they favor certain modes (c.f., the large fraction of smiling faces in the CelebA dataset or the large fraction of dark-haired individuals in FFHQ). These inconsistencies will be reproduced in any data sampled from the trained DGN, which has far-reaching potential implications for fairness, data augmentation, anomaly detection, domain adaptation, and beyond. In response, we develop a differential-geometry-based technique that, given a trained DGN, adapts its generative process so that the distribution on the data generating manifold is uniform. We prove theoretically and validate experimentally that our technique can be used to produce a uniform distribution on the manifold regardless of the training set distribution.

Authors: Ahmed Imtiaz Humayun, Randall Balestriero, and Richard Baraniuk

Speakers

Ahmed Imtiaz Humayun

Rice University

Wednesday October 27, 2021 3:30pm - 3:45pm CDT
Auditorium

Technical Presentation

3:45pm CDT

Magnified Convolutional Enrichment Representation Model - Auditorium

Feature representation mathematically characterizes domain entities, which is crucial in machine learning. We designed a dynamic deep model to evaluate the over-representation of a disease and genes as the controlled vocabulary, with leveraging the contexture information with the word embedding and the global enrichment information, to represent the human diseases. The model has been evaluated and demonstrated the good fitness for predicting the associations of complex diseases.
Authors: Guocai Chen, Herbert Chen, Yuntao Yang, Abhisek Mukherjee, Shervin Assassi, Claudio Soto, and Wenjin Zheng

Speakers

W. Jim Zheng

UTHealth

Wednesday October 27, 2021 3:45pm - 4:00pm CDT
Auditorium

Technical Presentation

4:00pm CDT

PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication - Auditorium

Graph Convolutional Networks (GCNs) is the state-of-the-art method for learning graph-structured data. Training large-scale GCNs requires distributed training across multiple accelerators such that each accelerator is able to hold a partitioned subgraph. However, distributed GCN training incurs prohibitive overhead of communicating node features and gradients among partitions for every GCN layer in each training iteration, limiting the achievable training efficiency and model scalability. To this end, we propose PipeGCN, a simple-yet-effective scheme that hides the communication overhead by pipelining inter-partition communication with intra-partition computation. It is non-trivial to pipeline for efficient GCN training, as communicated node features/gradients will become stale and thus can harm the convergence, negating the pipeline benefit. Notably, little is known regarding the convergence rate of GCN training with stale features. This work not only provides a theoretical convergence guarantee but also finds the convergence rate of PipeGCN to be close to that of the vanilla distributed GCN training without pipeline. Furthermore, we develop a smoothing method to further improve PipeGCN's convergence. Extensive experiments show that PipeGCN can largely boost training throughput (up to 2.2×) while achieving the same accuracy as its vanilla counterpart and that PipeGCN also outperforms existing full-graph training methods.

Authors: Cheng Wan, Youjie Li, Cameron Wolfe, Anastasios Kyrillidis, Nam Kim, and Yingyan Lin

Speakers

Cheng Wan

Rice University

Wednesday October 27, 2021 4:00pm - 4:15pm CDT
Auditorium

Technical Presentation

4:15pm CDT

Quantification of Myxococcus Xanthus Aggregation and Rippling Behaviors: Deep-Learning Transformation of Phase-Contrast into Fluorescence Microscopy Images - Auditorium

Myxococcus xanthus bacteria are a model system for understanding pattern formation and collective cell behaviors. When starving, cells aggregate into fruiting bodies to form metabolically inert spores. During predation, cells self-organize into traveling cell-density waves termed ripples. Both phase-contrast and fluorescence microscopy are used to observe these patterns but each has its limitations. Phase-contrast images have higher contrast, but the resulting image intensities lose their correlation with cell density. The intensities of fluorescence microscopy images, on the other hand, are well-correlated with cell density, enabling better segmentation of aggregates and better visualization of streaming patterns in between aggregates; however, fluorescence microscopy requires the engineering of cells to express fluorescent proteins and can be phototoxic to cells. To combine the advantages of both imaging methodologies, we develop a generative adversarial network that converts phase-contrast into synthesized fluorescent images. By including an additional histogram-equalized output to the state-of-the-art pix2pixHD algorithm, our model generates accurate images of aggregates and streams, enabling the estimation of aggregate positions and sizes, but with small shifts of their boundaries. Further training on ripple patterns enables accurate estimation of the rippling wavelength. Our methods are thus applicable for many other phenotypic behaviors and pattern formation studies.

Authors: Jiangguo Zhang, Jessica Comstock, Christopher Cotter, Patrick Murphy, Weili Nie, Roy Welch, Ankit Patel, and Oleg Igoshin

Speakers

Jiangguo Zhang

Rice University

Wednesday October 27, 2021 4:15pm - 4:30pm CDT
Auditorium

Technical Presentation