Loading…

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Wednesday, October 20
 

8:00am CDT

Opening: Pre-Recorded Poster Presentations
Hyperspectral Image Decomosition and Material Identification Through Autoencoders
Authors: Mira Welner (University of California Davis) and Aswin Sankaranarayanan (Carnegie Mellon University)

Using Next Generation Sequencing and Bioinformatics to Explore Global Host Responses of Human Lung Epithelial Cells to SARS-CoV-2 Infection
Authors: Vivian Tat (The University of Texas Medical Branch), Aleksandra Drelich (The University of Texas Medical Branch), Kempaiah Kempaiah (Southern Research Institute), Jason Hsu (The University of Texas Medical Branch) and Chien-Te Tseng (The University of Texas Medical Branch)

SeqScreen: Accurate and Sensitive Functional Screening of Pathogenic Sequences via Ensemble Learning
Authors: Advait Balaji (Department of Computer Science, Rice University), Bryce Kille (Department of Computer Science, Rice University), Anthony Kappell (Signature Science LLC), Gene Godbold (Signature Science LLC), Madeline Diep (Fraunhofer USA Center Mid-Atlantic CMA), R. A Leo Elworth (Department of Computer Science, Rice University), Zhiqin Qian (Department of Computer Science, Rice University), Dreycey Albin (Department of Computer Science, Rice University), Daniel Nasko (Department of Computer Science, University of Maryland, College Park), Nidhi Shah (Department of Computer Science, University of Maryland, College Park), Mihai Pop (Department of Computer Science, University of Maryland, College Park), Santiago Segarra (Department of Electrical and Computer Engineering, Rice University), Krista Ternus (Signature Science LLC) and Todd Treangen (Department of Computer Science, Rice University)

DLPacker: Deep Learning for Prediction of Amino Acid Side Chain Conformations in Proteins
Authors: Mikita Misiura (Rice University), Raghav Shroff (CCDC Army Research Lab—South), Ross Thyer (Rice University) and Anatoly Kolomeisky (Rice University)

Log2NS: Enhancing Deep Learning Based Analysis of Logs with Formal to Prevent Survivorship Bias
Authors: Charanraj Thimmisetty (Palo Alto Networks), Praveen Tiwari (Palo Alto Networks), Didac Gil de la Iglesia (Palo Alto Networks), Nandini Ramanan (Palo Alto Networks), Marjorie Sayer (Palo Alto Networks), Viswesh Ananthakrishnan (Palo Alto Networks) and Claudionor Coelho Jr (Palo Alto Networks)

A Binary Classification Model to Diagnose Diabetes Mellitus in an Intensive Care Unit
Authors: Bolanle Adesanya (University of Texas Health Science Center at Houston)

Optimizing a Nanoparticle-Enhanced Membrane Distillation Setup Excited with LEDs
Authors: Bryant Jerome (Rice University, Department of Electrical and Computer Engineering), Pratiksha Dongare (Rice University, Department of Electrical and Computer Engineering), Oara Neumann (Rice University, Department of Electrical and Computer Engineering), Naomi Halas (Rice University, Department of Electrical and Computer Engineering) and Alessandro Alabastri (Rice University, Department of Electrical and Computer Engineering)

Severe Slugging Control Using Deep Reinforcement Learning
Authors: Jayanth Nair (Wood), Shirley Ike (Wood) and Yue Hu (Wood)

Training and Optimizing an Ensemble of Deep Learning Models to Recognize Structural Rearrangements in Hi-C Data
Authors: Mahdi Sadr (University of Texas Austin), Muhammad Shamim (BCM/Rice), Sharon Sun (Rice/BCM), Alyssa Blackburn (BCM), Olga Dudchenko (BCM/Rice), Neva Durand (BCM/Rice/Harvard and MIT Broad Institute) and Erez Aiden (BCM/Rice/Harvard and MIT Broad Institute)

Constructing Descriptors in Catalysis with Iterative Bayesian Additive Regression Trees (iBART)
Authors: Chun-Yen Liu (Rice University), Shengbin Ye (Rice University), Meng Li (Rice University) and Thomas Senftle (Rice University)

Machine learning regression model for prediction and analysis of potential SARS-CoV-2 inhibitors
Authors: Lilian Biasi (University of Campinas), Opalina Vetrichelvan (Massachusetts Institute of Technology) and Luís Franco (University of Campinas)

Modeling Subsurface Porous Media Flow Using Nvidia SimNet
Authors: Spatika Iyengar (AquaNRG Consulting Inc.), Shaina Kelly (AquaNRG) and Babak Shafei (AquaNRG)

False Discovery Rate and Exceedance Control in High-Dimensional Linear Models
Authors: Huiming Lin (Rice University) and Meng Li (Rice University)

Domain Independent Linguistic Markers for Deception
Authors: Xuting Liu (University of California, Berkeley) and Rakesh Verma (University of Houston)

Neo4j Graph Databases Visualized in Virtual Reality
Author: James Mireles (Aviar Technology, LLC)

Scalable Average Consensus with Compressed Communications
Authors: Mohammad Taha Toghani (Rice University) and César A. Uribe (Rice University)


Wednesday October 20, 2021 8:00am - 5:00pm CDT

8:00am CDT

Opening: Pre-Recorded Sponsor Presentations
Sponsored Presentation by Bright Data
  • Title: Making Public Web Data Accessible to All, Everywhere
Sponsored Presentation by DDN
Sponsored Presentation by PROS
  • Title: What Can Artificial Intelligence Do For Your Business?
Sponsored Presentation by SambaNova Systems
  • Title: Unlocking What's Possible at the Cutting Edge of AI
Sponsored Presentation by TwoSigma
  • Title: A 24x Speedup for Reinforcement Learning with RLlib + Ray

Wednesday October 20, 2021 8:00am - 5:00pm CDT
 
Monday, October 25
 

8:00am CDT

Welcome
Welcome to the 5th annual Ken Kennedy AI and Data Science Conference. I am Lydia Kavraki, the director of the Ken Kennedy Institute.

This year the conference presentations and sponsored booths have gone virtual due to COVID-19. We will have however an outdoor networking reception on Tuesday evening. The virtual format of the conference has given us the opportunity to invite broad participation: our speakers and attendees come from all over the country and all over the world. I would like to thank you for checking in with us for an exciting few days of talks, posters, and discussions on AI and Data Science.

AI is touching our lives in myriad ways. It is changing the way we work but also the way we learn, monitor our health, shop, and relax. It is accelerating biological discoveries. It is helping us understand our planet and the universe. Through automation, it is transforming several industries. The use of AI is also raising concerns of privacy and equity that we must confront sooner than later.

There is no doubt that AI and Data science research are key to solving impactful problems in the world. Meetings such as this foster innovation by encouraging new ideas, research, and conversation. Data scientists will be future leaders and large contributors from developing new medicines and treatments, to using energy sources effectively, to creating safe & smart cities, and to helping understand and mitigate our changing climate. We look forward to the conversations that will occur in the next few days and the partnerships that these conversations will spark.

Monday keynote speakers will discuss the future of NASA missions, how to teach AI to speak multiple languages, scalable deep learning, AI for good, and AI ethics. On Tuesday, we have an exciting line up of technical talks selected from submitted abstracts covering AI for good, algorithms & foundations, business impact, and healthcare. During the lunch break on Tuesday, there will be a workshop on ML for Energy Transition.

The conference offers opportunities to meet fellow attendees, speakers, and sponsors through chat messages and scheduling meetings directly within the virtual conference platform. In hopes of improving connections in these virtual times, the platform prioritizes networking matches based on each attendee’s interests and goals. There are also opportunities to engage during networking breaks in the breakout rooms.

We encourage you to view and connect with the presenters of the pre-recorded poster videos as well as the virtual booths of our conference sponsors: Bright Data – The Bright Initiative, DDN, Microsoft, PROS, SambaNova Systems, Two Sigma, and VAST Data.

On Tuesday afternoon from 4:30-6:30 PM there will be an outdoor networking reception at Holman Draft Hall in Houston. The outdoor networking reception venue is covered with open sides for continuous airflow, and it is included in the conference registration. Join us for a great time to network with industry professionals and our sponsors. A special thank you to DDN for being the Bar Sponsor.

Wednesday will be an in-person add-on day at the Rice University campus in Houston with a technical workshop titled Scalable and Sustainable AI: Overcoming the Wide Gap Between AI in Theory and AI in Production. If you are interested and have not registered yet, visit the conference website for more details.

Finally, there will be a virtual post-conference workshop on November 30 at 10:00 AM available to all attendees on Data4Good & Responsible AI-Automated Data Collection: How do you ensure data4good? by Bright Data – The Bright Initiative. The Ken Kennedy Institute at Rice University is committed to supporting cutting-edge research, educating innovators, and connecting across industries by bringing together thought leaders from around the world with expertise in AI, data, and computing. We are thrilled to put this capability at the service of our regional and global AI and data science community.

We are grateful to all those who have contributed talks to this conference. We are grateful to our sponsors, partners, and attendees, who share our enthusiasm and who seek the opportunity to support and engage with the community. We want to recognize our sponsors who have allowed us to have complimentary registration for all attendees. Finally, we would like to say a special thank you to our Conference Committee for their many contributions to this year’s conference.

On behalf of the Conference Committee, our fellow constituents and sponsors, Rice University, and the amazing Ken Kennedy Institute team led by our executive director Dr. Angela Wilkins, I want to thank you for being here.

We hope you will thoroughly enjoy this year’s virtual conference, and we look forward to seeing many of you at the outdoor networking reception on Tuesday evening!

Lydia Kavraki
Director, Ken Kennedy Institute

Speakers
avatar for Lydia Kavraki

Lydia Kavraki

Director, The Ken Kennedy Institute
Lydia E. Kavraki is the Director of the Ken Kennedy Institute. She is the Noah Harding Professor of Computer Science, professor of Bioengineering, professor of Electrical and Computer Engineering, and professor of Mechanical Engineering at Rice University.Kavraki received her B.A. in Computer Science from... Read More →


Monday October 25, 2021 8:00am - 8:05am CDT

8:00am CDT

Opening: Pre-Recorded Poster Presentations
Hyperspectral Image Decomosition and Material Identification Through Autoencoders
Authors: Mira Welner (University of California Davis) and Aswin Sankaranarayanan (Carnegie Mellon University)

Using Next Generation Sequencing and Bioinformatics to Explore Global Host Responses of Human Lung Epithelial Cells to SARS-CoV-2 Infection
Authors: Vivian Tat (The University of Texas Medical Branch), Aleksandra Drelich (The University of Texas Medical Branch), Kempaiah Kempaiah (Southern Research Institute), Jason Hsu (The University of Texas Medical Branch) and Chien-Te Tseng (The University of Texas Medical Branch)

SeqScreen: Accurate and Sensitive Functional Screening of Pathogenic Sequences via Ensemble Learning
Authors: Advait Balaji (Department of Computer Science, Rice University), Bryce Kille (Department of Computer Science, Rice University), Anthony Kappell (Signature Science LLC), Gene Godbold (Signature Science LLC), Madeline Diep (Fraunhofer USA Center Mid-Atlantic CMA), R. A Leo Elworth (Department of Computer Science, Rice University), Zhiqin Qian (Department of Computer Science, Rice University), Dreycey Albin (Department of Computer Science, Rice University), Daniel Nasko (Department of Computer Science, University of Maryland, College Park), Nidhi Shah (Department of Computer Science, University of Maryland, College Park), Mihai Pop (Department of Computer Science, University of Maryland, College Park), Santiago Segarra (Department of Electrical and Computer Engineering, Rice University), Krista Ternus (Signature Science LLC) and Todd Treangen (Department of Computer Science, Rice University)

DLPacker: Deep Learning for Prediction of Amino Acid Side Chain Conformations in Proteins
Authors: Mikita Misiura (Rice University), Raghav Shroff (CCDC Army Research Lab—South), Ross Thyer (Rice University) and Anatoly Kolomeisky (Rice University)

Log2NS: Enhancing Deep Learning Based Analysis of Logs with Formal to Prevent Survivorship Bias
Authors: Charanraj Thimmisetty (Palo Alto Networks), Praveen Tiwari (Palo Alto Networks), Didac Gil de la Iglesia (Palo Alto Networks), Nandini Ramanan (Palo Alto Networks), Marjorie Sayer (Palo Alto Networks), Viswesh Ananthakrishnan (Palo Alto Networks) and Claudionor Coelho Jr (Palo Alto Networks)

A Binary Classification Model to Diagnose Diabetes Mellitus in an Intensive Care Unit
Authors: Bolanle Adesanya (University of Texas Health Science Center at Houston)

Optimizing a Nanoparticle-Enhanced Membrane Distillation Setup Excited with LEDs
Authors: Bryant Jerome (Rice University, Department of Electrical and Computer Engineering), Pratiksha Dongare (Rice University, Department of Electrical and Computer Engineering), Oara Neumann (Rice University, Department of Electrical and Computer Engineering), Naomi Halas (Rice University, Department of Electrical and Computer Engineering) and Alessandro Alabastri (Rice University, Department of Electrical and Computer Engineering)

Severe Slugging Control Using Deep Reinforcement Learning
Authors: Jayanth Nair (Wood), Shirley Ike (Wood) and Yue Hu (Wood)

Training and Optimizing an Ensemble of Deep Learning Models to Recognize Structural Rearrangements in Hi-C Data
Authors: Mahdi Sadr (University of Texas Austin), Muhammad Shamim (BCM/Rice), Sharon Sun (Rice/BCM), Alyssa Blackburn (BCM), Olga Dudchenko (BCM/Rice), Neva Durand (BCM/Rice/Harvard and MIT Broad Institute) and Erez Aiden (BCM/Rice/Harvard and MIT Broad Institute)

Constructing Descriptors in Catalysis with Iterative Bayesian Additive Regression Trees (iBART)
Authors: Chun-Yen Liu (Rice University), Shengbin Ye (Rice University), Meng Li (Rice University) and Thomas Senftle (Rice University)

Machine learning regression model for prediction and analysis of potential SARS-CoV-2 inhibitors
Authors: Lilian Biasi (University of Campinas), Opalina Vetrichelvan (Massachusetts Institute of Technology) and Luís Franco (University of Campinas)

Modeling Subsurface Porous Media Flow Using Nvidia SimNet
Authors: Spatika Iyengar (AquaNRG Consulting Inc.), Shaina Kelly (AquaNRG) and Babak Shafei (AquaNRG)

False Discovery Rate and Exceedance Control in High-Dimensional Linear Models
Authors: Huiming Lin (Rice University) and Meng Li (Rice University)

Domain Independent Linguistic Markers for Deception
Authors: Xuting Liu (University of California, Berkeley) and Rakesh Verma (University of Houston)

Neo4j Graph Databases Visualized in Virtual Reality
Author: James Mireles (Aviar Technology, LLC)

Scalable Average Consensus with Compressed Communications
Authors: Mohammad Taha Toghani (Rice University) and César A. Uribe (Rice University)


Monday October 25, 2021 8:00am - 5:00pm CDT

8:00am CDT

Opening: Pre-Recorded Sponsor Presentations
Sponsored Presentation by Bright Data
  • Title: Making Public Web Data Accessible to All, Everywhere
Sponsored Presentation by DDN
Sponsored Presentation by PROS
  • Title: What Can Artificial Intelligence Do For Your Business?
Sponsored Presentation by SambaNova Systems
  • Title: Unlocking What's Possible at the Cutting Edge of AI
Sponsored Presentation by TwoSigma
  • Title: A 24x Speedup for Reinforcement Learning with RLlib + Ray

Monday October 25, 2021 8:00am - 5:00pm CDT

10:00am CDT

Opening: Intro to Morning
Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Speakers
avatar for Angela Wilkins

Angela Wilkins

Executive Director, The Ken Kennedy Institute
Angela Wilkins is the Executive Director of the Ken Kennedy Institute. Angela is responsible for the development and implementation of Ken Kennedy Institute’s programs in the computational sciences. After earning a Ph.D. in theoretical physics from Lehigh University, she shifted... Read More →


Monday October 25, 2021 10:00am - 10:05am CDT

10:05am CDT

Integrated Data - Driving the Future of NASA Missions and Technology
Sponsored by DDN

An overview of NASA and its current plans to return to the Moon and beyond and the driving technologies required to achieve the next steps in exploration. The presentation will describe the mission and technology needs and how those align to the need for a clear data architecture and related data science initiatives.

Speakers
avatar for Ronnie Clayton

Ronnie Clayton

Acting Chief Technologist, NASA Johnson Space Center
Ronnie Clayton is currently the Deputy Chief Technologist at NASA's Johnson Space Center where he advises Center leadership on matters concerning research and technology development. He has an extensive background in systems engineering and integration.


Monday October 25, 2021 10:05am - 10:45am CDT

10:45am CDT

How to Teach AI Multiple Languages?
Sponsored by PROS


AI applications are proliferating in consumer and business domains these days around the world. Have you ever wondered how Siri, Google Home, Google Maps or Amazon Echo speaks to users in different countries in their local languages? How does an automated customer support chatbot that you are speaking with or texting with speak or understand your local language to resolve your problems? AI models that power these applications have to speak the language of the user and the language of the business for them to be useful and relevant. Polyglot AI is not magic just the way AI itself is not magic! It takes a lot of hard work to teach AI to understand and speak new languages. In this talk, I’ll take you through some behind-the-scenes hard work to build multilingual natural language processing systems that enable AI to speak multiple languages.

Speakers
avatar for Rama Akkiraju

Rama Akkiraju

IBM Fellow, IBM
Rama Akkiraju is an IBM Fellow, Master Inventor, and IBM Academy Member at IBM’s Automation Division where she is the CTO of AI Operations. AI Operations is about optimizing information technology (IT) operations management using Artificial Intelligence (AI). Prior to this role... Read More →


Monday October 25, 2021 10:45am - 11:25am CDT

11:25am CDT

Break
Monday October 25, 2021 11:25am - 11:40am CDT

11:40am CDT

Scalable Deep Learning for Large Scale Scientific Machine Learning: Challenges and Opportunities
Sponsored by SambaNova Systems

Scientific Machine Learning at HPC scale presents many challenges related to massive data sets, applications with large sample sizes, and large neural network models. Examples of these challenges are showcased in application areas ranging from small molecule drug design, to cosmology, and sequence-based transformer language models. In this talk we discuss the opportunities that these challenge create and present our on-going work in developing and composing multiple methods of parallel training within the LBANN scalable deep learning toolkit for these SciML applications. Furthermore, we showcase how these methods have enabled the use of leadership-class HPC systems, such as Sierra, to accelerate the training of neural network architectures.

Speakers
avatar for Brian Van Essen

Brian Van Essen

Informatics Group leader and Computer Scientist, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory (LLNL)
Brian is the Informatics Group leader and a Computer Scientist in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory (LLNL). He is actively pursuing research in large-scale deep learning for scientific domains and training deep neural networks using... Read More →


Monday October 25, 2021 11:40am - 12:20pm CDT

12:20pm CDT

Lunch + Virtual Booths
Monday October 25, 2021 12:20pm - 1:20pm CDT

1:20pm CDT

Welcome: Intro to Afternoon
Speakers
avatar for Angela Wilkins

Angela Wilkins

Executive Director, The Ken Kennedy Institute
Angela Wilkins is the Executive Director of the Ken Kennedy Institute. Angela is responsible for the development and implementation of Ken Kennedy Institute’s programs in the computational sciences. After earning a Ph.D. in theoretical physics from Lehigh University, she shifted... Read More →


Monday October 25, 2021 1:20pm - 1:25pm CDT

1:25pm CDT

How Do We Know if Data Science is "For Good"?
Sponsored by Two Sigma

We interact with the outputs from quantitative models multiple times a day. As methods from statistics, machine learning, and artificial intelligence become more ubiquitous, so too do calls to ensure that these methods are used “for good” or at the very least, ethically. But how do we know if we are achieving “good”?

This question will frame a presentation of case studies from the Human Rights Data Analysis Group (HRDAG), a Bay Area nonprofit that uses data science to analyze patterns of violence. Examples will include collaborations with US-based organizations investigating police misconduct and partnerships with international truth commissions and war crimes prosecutors. HRDAG projects will be used to illustrate challenges of real-world data, including incomplete and unrepresentative samples, and adversarial political and/or legal climates.

The potential harm that can be done when inappropriately analyzing and interpreting incomplete and imperfect data will be especially highlighted, including questions such as: How can we develop approaches to help us identify the cases where analytical tools can do the most good, and avoid or mitigate the most harm? We propose starting with two simple questions: What is the cost of being wrong? And who bears that cost?

Speakers
avatar for Megan Price

Megan Price

Executive Director, Human Rights Data Analysis Group
As the Executive Director of the Human Rights Data Analysis Group, Megan Price drives the organization’s overarching strategy, leads scientific projects, and presents HRDAG’s work to diverse audiences. Her scientific work includes analyzing documents from the National Police Archive... Read More →


Monday October 25, 2021 1:25pm - 2:05pm CDT

2:20pm CDT

What Do We Mean by “AI Ethics”?: Conceptual Tools and Examples for a Common Future
Sponsored by: Bright Data - The Bright Initiative

As concerns about the inequitable social impacts of AI have widened over the last few years, AI ethics researchers have increasingly steered their focus away from the normative dimension of AI design – whether an engineering decision is “right” or “wrong” – toward issues more traditionally grounded in the realm of society and politics such as historical social inequalities and systematic discrimination. Throughout this transition several different concepts and initiatives have emerged that aim to capture the spirit of this conceptual change, yet unwittingly have also helped create a degree of discursive ambiguity and division. Is the purpose of “ethical AI” ultimately to help promote “social good”? To serve the “public interest”? To create more “justice”? As this presentation will explain, the terms and examples that we use to describe the purpose of “ethical AI” are critical because each one carries a particular set of conceptual affordances and limitations that can hinder opportunities for cooperation between social actors. For this reason, it is critical that we develop a set of common examples and conceptual tools to help academic researchers, technology developers, and social activists more effectively work together toward a common future.

Speakers
avatar for Rodrigo Ferreira

Rodrigo Ferreira

Assistant Teaching Professor, Computer Science, Rice University
Rodrigo Ferreira is an Assistant Teaching Professor in Computer Science at Rice University. In this role, Rodrigo teaches numerous courses in technology and ethics and is responsible for developing ethics and social justice curricula across the computer science department. In addition... Read More →


Monday October 25, 2021 2:20pm - 3:00pm CDT

3:00pm CDT

Break
Monday October 25, 2021 3:00pm - 3:15pm CDT

3:15pm CDT

Discovery and Action Through Geospatial Visualization and Urban Analytics
Sponsored by Microsoft

This talk will cover projects that range from urban history to urban analytics for the purpose of discovery and action. We will explore a range of projects including the collaborative effort with Houston Health Department and TMC partners to understand and forecast SARS CoV-2 prevalence through wastewater.

Speakers
avatar for Katherine Ensor

Katherine Ensor

Noah G. Harding Professor of Statistics, Director of CoFES, Rice University
Katherine Bennett Ensor, PhD, is the Noah G. Harding Professor of Statistics in the George R. Brown School of Engineering at Rice University where she serves as director of the Center for Computational Finance and Economic Systems (CoFES). She also oversees the Kinder Institute Urban... Read More →
avatar for Farès El-Dahdah

Farès El-Dahdah

Professor of Humanities, Director of the Humanities Research Center, Rice University
Farès el-Dahdah has written extensively on Brazil's modern architecture and has been involved in a number of collaborative projects with Casa de Lucio Costa and Fundação Oscar Niemeyer, two Brazilian cultural foundations on the boards of which he serves. He is currently leading... Read More →


Monday October 25, 2021 3:15pm - 3:55pm CDT

3:55pm CDT

Closing Remarks
Speakers
avatar for Angela Wilkins

Angela Wilkins

Executive Director, The Ken Kennedy Institute
Angela Wilkins is the Executive Director of the Ken Kennedy Institute. Angela is responsible for the development and implementation of Ken Kennedy Institute’s programs in the computational sciences. After earning a Ph.D. in theoretical physics from Lehigh University, she shifted... Read More →


Monday October 25, 2021 3:55pm - 4:00pm CDT
 
Tuesday, October 26
 

10:00am CDT

Emu: Species-Level Microbial Community Profiling for Full-Length Nanopore 16S Reads
Technical Presentations Group 1: Algorithms, Foundations, Visualizations, and Engineering Applications

16S rRNA based analysis is the established standard for elucidating microbial community composition. However, with short-read data delivering only a portion of the 16S gene, this analysis is limited to genus-level results at best. Obtaining species-level accuracy is imperative since two bacterial species within the same genus have proven to express drastically different behaviors on their community and human health. Full-length 16S sequences have the potential to provide species-level resolution. Yet, taxonomic identification algorithms designed for previous generation sequencers are not optimized for the increased read length and error rate of Oxford Nanopore Technologies (ONT). Here, we present Emu, a novel approach that employs an Expectation-Maximization (EM) algorithm, to generate a taxonomic abundance profile from full length 16S rRNA reads. We demonstrate accurate sample composition estimates by our new software through analysis on two mock communities and one simulated data set. We also show Emu to elicit fewer false positives and false negatives than previous methods on both short and long read data. Finally, we illustrate a real-world application of Emu by processing vaginal microbiome samples from women with and without vaginosis, where we observe distinct species-level differences in the microbial composition between the two groups that are fully concordant with prior research in this important area. In summary, full-length 16S ONT sequences, paired with Emu, opens a new realm of microbiome research possibilities. Emu proves, with the appropriate method, increased accuracy can be obtained with nanopore long reads despite the increased error rate. Our novel software tool Emu allows researchers to further leverage portable, real-time sequencing provided by ONT for accurate, efficient, and low-cost characterization of microbial communities.

Author: Kristen Curry (Rice University)


Speakers
KC

Kristen Curry

Rice University


Tuesday October 26, 2021 10:00am - 10:15am CDT

10:15am CDT

Co-Manifold Learning
Technical Presentations Group 1: Algorithms, Foundations, Visualizations, and Engineering Applications

Representation learning is typically applied to only one mode of a data matrix, either its rows or columns. Yet in many applications, there is an underlying geometry to both the rows and the columns. We propose utilizing this coupled structure to perform co-manifold learning: uncovering the underlying geometry of both the rows and the columns of a given matrix. Our framework is based on computing a multiresolution view of the data at different combinations of row and column smoothness by solving a collection of continuous optimization problems. We demonstrate our method’s ability to recover the underlying row and column geometry in simulated examples and real cheminformatics data.

Authors: Eric Chi (Rice University), Gal Mishne (University of California, San Diego), and Ronald Coifman (Yale University)

Speakers
EC

Eric Chi

Rice University


Tuesday October 26, 2021 10:15am - 10:30am CDT

10:30am CDT

Random-Walk Based Graph Representation Learning Revisited
Technical Presentations Group 1: Algorithms, Foundations, Visualizations, and Engineering Applications

Representation learning is a powerful framework for enabling the application of machine learning to complex data via vector representations. Here, we focus on representation learning for vertices of a graph using random walks. We introduce a framework for node embedding based on three dimensions: type of process, similarity metric, and embedding algorithm. Our framework not only covers many existing approaches but also motivates new ones. In particular, we apply it to produce new state-of-the-art results on link prediction.

Authors: Zexi Huang (UCSB), Arlei Silva (Rice University), and Ambuj Singh (UCSB)

Speakers
AS

Arlei Silva

Rice University


Tuesday October 26, 2021 10:30am - 10:45am CDT

10:45am CDT

Break
Tuesday October 26, 2021 10:45am - 11:00am CDT

11:00am CDT

Multi-Task Deep Learning Framework for Sales Forecasting and Product Recommendation
Technical Presentations Group 2: AI for Good + Business Impact/Industry

Sales forecasting and product recommendation are important tasks for Business-to-Business (B2B) companies, particularly as more business transactions are occurring through digital channels (eCommerce). Transaction data contains both explicit signals (price, revenue, ratings) and implicit signals (product purchases, user clicks). Sales prediction, based on explicit signals, and product recommendation, based on implicit signals, are commonly achieved with separate machine learning models. We propose a new multi-task learning model framework, which performs a joint optimization to do prediction and recommendation tasks simultaneously. This multi-task deep learning model captures and predicts seasonality in the data and has an effective sampling mechanism to improve implicit feedback for the recommendation task. Our experiments on real B2B transaction datasets have shown that the multi-task model can achieve comparable performance for both tasks compared to single-task models (around 40% lower mean absolute percentage error and 30% improvement in Diversity@K, which is the percentage of overall items that are captured in the top K recommendations). In addition, the multi-task model enables better solutions to problems such as cold start and collaborative filtering

Authors: Wenshen Song (PROS Inc.), Yan Xu (PROS Inc.), Faruk Sengul (PROS Inc.), and Justin Silver (PROS Inc.)

Speakers

Tuesday October 26, 2021 11:00am - 11:15am CDT

11:15am CDT

Using Visual Feature Space as a Pivot Across Languages
Technical Presentations Group 2: AI for Good + Business Impact/Industry

People can create image descriptions using thousands of languages, but these languages share only one visual space. The aim of this work is to leverage visual feature space to pass information across languages. We show that models trained to generate textual captions in more than one language conditioned on an input image can leverage their jointly trained feature space during inference to pivot across languages. We particularly demonstrate improved quality on a caption generated from an input image, by leveraging a caption in a second language. More importantly, we demonstrate that even without conditioning on any visual input, the model demonstrates to have learned implicitly to perform to some extent machine translation from one language to another in their shared visual feature space even though the multilingual captions used for training are created independently.

Authors: Ziyan Yang (Rice University), Leticia Pinto-Alva (University of Southern California), Franck Dernoncourt (Adobe Research), and Vicente Ordóñez (Rice University)

Speakers
ZY

Ziyan Yang

Rice University


Tuesday October 26, 2021 11:15am - 11:30am CDT

11:30am CDT

Math Word Problem Generation with Mathematical Consistency and Problem Context Constraints
Technical Presentations Group 2: AI for Good + Business Impact/Industry

We study the problem of generating arithmetic math word problems (MWPs) given a math equation that specifies the mathematical computation and a context that specifies the problem scenario. Existing approaches are prone to generating MWPs that are either mathematically invalid or have unsatisfactory language quality. They also either ignore the context or require manual specification of a problem template, which compromises the diversity of the generated MWPs. In this paper, we develop a novel MWP generation approach that leverages i) pre-trained language models and a context keyword selection model to improve the language quality of the generated MWPs and ii) an equation consistency constraint for math equations to improve the mathematical validity of the generated MWPs. Extensive quantitative and qualitative experiments on three real-world MWP datasets demonstrate the superior performance of our approach compared to various baselines.

Authors: Zichao Wang (Rice University), Andrew Lan (University of Massachusetts Amherst), and Richard Baraniuk (Rice University)

Speakers
ZJ

Zichao (Jack) Wang

Rice University


Tuesday October 26, 2021 11:30am - 11:45am CDT

11:45am CDT

Multi-Task Learning for Demand Prediction Through a Hyper-Network
Technical Presentations Group 2: AI for Good + Business Impact/Industry

Demand of goods in consumer goods is based on a variety of factors like price, seasonality, competitor price, geographic location, demographic data, etc. A common practice is to use some features, like geographic data and demographic data, to segment the market and build an individual model for each segment. However, with this approach we lose potentially valuable information which can be learned across segments. Hence, we propose a method for simultaneously learning multiple demand models to borrow knowledge and improve accuracy, especially for models with sparser data. For this, we propose using a neural network as a hyper-network to estimate the parameters of each demand model. Our approach leads to knowledge sharing across models as opposed to independent model fitting in each task while generating a model which is computationally tractable. Results of applying the proposed method on large-scale real data shows improved prediction accuracy and price elasticity estimates compared with the common two-step approach of clustering and using independent models.

Authors: Manu Chaudhary (PROS), Yanyan Hu (University of Houston), and Shahin Boluki (PROS)

Speakers

Tuesday October 26, 2021 11:45am - 12:00pm CDT

12:00pm CDT

Lunch + Networking
Tuesday October 26, 2021 12:00pm - 1:00pm CDT

12:00pm CDT

ML for Energy Transition
Talk 1:End-to-End Approaches to Enhance the CO2 Capture - Cécile Pereira
Nanoporous materials can be used as solid adsorbents to capture CO2 from combustion flue gases or directly from the air using what is called a temperature swing adsorption (TSA) process. In this process, the gas containing the CO2 is injected into a gas (CO2 source) / solid (solid-sorbents material) contactor of the material where the pores of the material selectively adsorb the CO2. Once the adsorbent is saturated, a highly enriched CO2 gas stream is recovered by purging the contactor with a combination of heat and steam. If the right nanoporous material can be found, a cost-effective approach to CO2 capture may be achievable. ACO2RDS (Adsorptive CO2 removal from dilute sources) is a multi-year project to develop transformative solid-sorbent-based technologies for CO2 capture from dilute sources, specifically natural gas combined cycle (NGCC) power plant flue gas and atmospheric CO2 with direct air capture (DAC). In this presentation, we introduce the ACO2RD project and we review key state-of-the-art publications on the topic.


Talk 2: A Deep Learning-Accelerated Data Assimilation and Forecasting Workflow for Commercial-Scale Geologic Carbon Storage - Hewei Tang
Fast assimilation of monitoring data to forecast the transport of materials in heterogeneous media has many important applications. Such applications include the management of CO2 migration in geologic carbon storage reservoirs. It is often critical to assimilate emerging data and make forecast in a timely manner. However, the high computational cost of data assimilation with a high-dimensional parameter space undermines our ability to achieve this goal.

In the context of geologic carbon storage, we propose to leverage physical understandings of porous medium flow behavior with deep learning techniques to develop a fast history matching -reservoir response forecasting workflow. Applying an Ensemble Smoother Multiple Data Assimilation (ES-MDA) framework, the workflow updates geologic properties and predicts reservoir performance with quantified uncertainty from observed pressure and CO2 plumes. As the most computationally expensive component in such a workflow is reservoir simulation, we developed surrogate models to predict dynamic pressure and CO2 plume extents under multi-well injection. The surrogate models employ deep convolutional neural networks, specifically, a wide residual network and a residual U-Net. Intelligent treatments are applied to bridge between quantities in a true 3D reservoir and a single-layer model underlying the workflow. The workflow can complete history matching and reservoir forecasting with uncertainty quantification in less than one hour on a mainstream personal workstation.


Talk 3: Monitoring of Microseismic for CO2 Sequestration - Bob Clapp
Monitoring of microseismic events is going to play an important role in evaluating CO2 reservoirs during injection. Since DAS fibers are installed down wells and are thus close to the microseismic events, they hold vast potential for high-resolution analysis of their continuously-recorded data.


However, accurately detecting microseismic signals in continuous data is challenging and time-consuming. DAS acquisitions generate substantial data volumes, and microseismic events have a low signal-to-noise ratio in individual DAS channels.


Herein we design, train, and deploy a machine learning model to automatically detect microseismic events in DAS data acquired inside a proxy to an injection well, an unconventional reservoir. We create a curated dataset of 6,786 manually-picked microseismic events. The machine learning model achieves an accuracy of 98.6\% on the benchmark dataset and even detects low-amplitude events missed during manual picking. Our methodology detects over 100,000 events allowing us to reconstruct the spatio-temporal fracture development accurately.

Speakers
avatar for Cécile Pereira

Cécile Pereira

Data Science & AI Research Scientist, TotalEnergies
Cécile Pereira is a research scientist in the digital domain, working for Total CSE, Data Science & AI team. Her current research focuses on the development of new products and materials. She is strongly involved in the computational chemistry project, and she is co-supervising the... Read More →
avatar for Hewei Tang

Hewei Tang

Postdoctoral Staff Member, Lawrence Livermore National Laboratory (LLNL)
Dr. Hewei Tang is currently a postdoctoral staff member in Lawrence Livermore National Laboratory’s Atmospheric, Earth, and Energy Division. She holds a Ph.D. degree in Petroleum Engineering from Texas A&M University. Dr. Tang serves as an Associate Editor of Journal of Petroleum... Read More →
avatar for Bob Clapp

Bob Clapp

Technical Director, Stanford Center for Computational Earth and Environmental Science
Dr. Robert “Bob” Clapp is Technical Director of the Stanford Center for Computational Earth and Environmental Science. He has been at Stanford University for two decades, during which time he has published dozens of articles and presented talks on a wide range of geophysical and... Read More →
avatar for Mauricio Araya

Mauricio Araya

Senior R&D Manager HPC & ML, TotalEnergies
Mauricio Araya is a Senior Computer Scientist and lead researcher working at TotalEnergies EP R&T USA. He is also lecturer with the Professional Science Master’s Program at  the  Weiss  School  of  Natural  Science  of  Rice University,  where  he  teaches  computational... Read More →


Tuesday October 26, 2021 12:00pm - 1:00pm CDT

1:00pm CDT

ShiftAddNet: A Hardware-Inspired Deep Network
Technical Presentations Group 3: Algorithms, Foundations, Visualizations, and Engineering Applications

Multiplication (e.g., convolution) is arguably a cornerstone of modern deep neural networks (DNNs). However, intensive multiplications cause expensive resource costs that challenge DNNs' deployment on resource-constrained edge devices, driving several attempts for multiplication-less deep networks. This paper presented ShiftAddNet, whose main inspiration is drawn from a common practice in energy-efficient hardware implementation, that is, multiplication can be instead performed with additions and logical bit-shifts. We leverage this idea to explicitly parameterize deep networks in this way, yielding a new type of deep network that involves only bit-shift and additive weight layers. This hardware-inspired ShiftAddNet immediately leads to both energy-efficient inference and training, without compromising the expressive capacity compared to standard DNNs. The two complementary operation types (bit-shift and add) additionally enable finer-grained control of the model's learning capacity, leading to more flexible trade-off between accuracy and efficiency, as well as improved robustness to quantization and pruning. We conduct extensive experiments and ablation studies, all backed up by our FPGA-based ShiftAddNet implementation and energy measurements. Compared to existing DNNs or other multiplication-less models, ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.

Authors: Haoran You (Rice University), Xiaohan Chen (The University of Texas at Austin), Yongan Zhang (Rice University), Chaojian Li (Rice University), Sicheng Li (Alibaba DAMO Academy), Zihao Liu (Alibaba DAMO Academy), Zhangyang Wang (The University of Texas at Austin), and Yingyan Lin (Rice University)

Speakers
HY

Haoran You

Rice University


Tuesday October 26, 2021 1:00pm - 1:15pm CDT

1:15pm CDT

Neural Architecture Search for Inversion
Technical Presentations Group 3: Algorithms, Foundations, Visualizations, and Engineering Applications

Over the years, people have been using deep learning to tackle inversion problems, and we see the framework has been applied to build the relationship between recording wavefield and velocity (Yang et al., 2016). Here we will extend the work from 2 perspectives, one is deriving a more appropriate loss function, as we know, the pixel-2-pixel comparison might not be the best choice to characterize image structure, and we will elaborate on how to construct cost function to capture high-level feature to enhance the model performance. Another dimension is searching for the more appropriate neural architecture, which can be viewed as a subproblem within hyperparameter optimization, which is a subset of an even bigger picture, the automatic machine learning, or AutoML. There are several famous networks, U-net, ResNet (He et al. 2016), and DenseNet (Huang et al., 2017), and they achieve phenomenal results for certain problems, yet it’s hard to argue they are the best for inversion problems without thoroughly searching within certain space. Here we will be showing our architecture search results for inversion.

Authors: Xin Zhao (CGG), Licheng Zhang (University of Houston) and Cheng Zhan (Microsoft)

Speakers

Tuesday October 26, 2021 1:15pm - 1:30pm CDT

1:30pm CDT

Generalized Zero-Shot Learning via Normalizing Flows
Technical Presentations Group 3: Algorithms, Foundations, Visualizations, and Engineering Applications

Generalized Zero-shot Learning (GZSL) in Computer Vision refers to the task of recognizing images for which classes are not available during training, but other data such as textual descriptions for all classes are available. The idea is to leverage the information from these language descriptions to recognize both seen and unseen classes by transferring knowledge from each modality. This setup poses a more realistic scenario in image classification problems, where it is not possible to manually collect and annotate all the images for a specific class, but it is more viable to use natural language descriptions. In this work, we explore Normalizing Flows to generate features from a shared latent space that aligns the image and textual representations. These new features synthetically generated by our model are then used to enlarge the training set, so that the aligned representations for all seen and unseen classes can be used to train a classifier in a supervised manner. For this purpose, we simultaneously train two Invertible Neural Networks, one for the image representation, and the other for the textual description. Our aim is that the features encoded in the forward pass would work as data embeddings which we align so that they share the same feature space. In the reverse pass, both networks are enforced to reconstruct their corresponding input as a supervised signal for each modality. In this way, our approach outperforms previous generative models that use Variational Autoencoders and Generative Adversarial Networks in the CUB dataset by significant margins.

Authors: Paola Cascante-Bonilla (University of Virginia), Yanjun Qi (University of Virginia) and Vicente Ordonez (Rice University)

Speakers
PC

Paola Cascante-Bonilla

University of Virginia


Tuesday October 26, 2021 1:30pm - 1:45pm CDT

1:45pm CDT

Break
Tuesday October 26, 2021 1:45pm - 2:00pm CDT

2:00pm CDT

Explainable Deep Learning Approaches to Predict Development of Brain Metastases in Patients with Lung Cancer Using Electronic Health Records
Technical Presentations Group 4: Healthcare

Brain metastases (BM) from lung cancer accounts for the majority of BM cases. Brain metastases cause neurological morbidity and affect quality of life as they can be associated with brain edema. Therefore, early detection of brain metastases and prompt treatment can achieve optimal control. In this study, we employed the RNN-based RETAIN model to predict the risk of developing BM among patients diagnosed with lung cancer based on electronic health record (EHR) data. Meanwhile, we also extended the feature attribution method, Kernel SHAP, to structural EHR data to interpret the decision process. The deep learning models utilize the longitudinal information between different patient encounters to obtain explainable predictions for BM. Through a series of well-defined cohort construction and case-control matching criteria, the best AUC in the test set was obtained by RETAIN reaching 0.825, which achieved 3.7% improvement compared with the baseline model. The high contribution list identified by RETAIN and Kernel EHR was highly related to BM development, and especially to the higher lung cancer stages. Moreover, the sensitivity analysis also demonstrated that both RETAIN and Kernel SHAP can recognize the unrelated features and put more contribution to the important features.

Authors: Zhao Li (UTHealth), Ping Zhu (UTHealth), Rongbin Li (UTHealth), Yoshua Esquenazi (UTHealth) and W. Jim Zheng (UTHealth)

Speakers

Tuesday October 26, 2021 2:00pm - 2:15pm CDT

2:15pm CDT

Deep Learning-Based Blood Glucose Predictors In Type 1 Diabetes
Technical Presentations Group 4: Healthcare

Objectives: In this work, we present short-term predictions of blood glucose (BG) levels in people with type 1 diabetes (T1D) obtained with a deep-learning based architecture applied to a multivariate physiological dataset of actual T1D patients. Methods: Stacks of convolutional neural network (CNN) and long short-term memory (LSTM) units are proposed to predict BG levels for 30, 60 and 90 minutes prediction horizons (PH), given historical glucose measurements, meal information and insulin intakes. Evaluation of predictive capabilities was performed on two actual patients datasets, Replace-BG and DIAdvisor, respectively. Findings: for 90 minutes PH our model obtained mean absolute error (MAE) of 17.30 ± 2.07 and 18.23 ± 2.97 [mg/dl], root mean squared error (RMSE) of 23.45 ± 3.18 and 25.12 ± 4.65 [mg/dl]), coefficient of determination (R2) of 84.13 ± 4.22 and 82.34 ± 4.54 [%], and in terms of the continuous glucose-error grid analysis (CG-EGA) 94.71 ± 3.89 [%] and 91.71 ± 4.32 [%] accurate predictions (AP), 1.81 ± 1.06 [%] and 2.51 ± 0.86 [%]) benign errors (BE), and 3.47 ± 1.12 [%] and 5.78 ± 1.72 [%] erroneous prediction (EP), for Replace-BG and DIAdvisor datasets, respectively. Conclusion: Our investigation demonstrated that our method, compared to existing approaches in the literature, achieved superior glucose forecasting performance, showing the potential for application in decision support systems for diabetes management.

Authors: Mehrad Jaloli (University of Houston) and Marzia Cescon (University of Houston)

Speakers
MC

Marzia Cescon

University of Houston


Tuesday October 26, 2021 2:15pm - 2:30pm CDT

2:30pm CDT

MLPrE – A Tool for Preprocessing Data and Conducting Exploratory Data Analysis Prior to Machine Learning Model Construction
Technical Presentations Group 4: Healthcare

Data preparation is one of the less glamorous aspects of doing Data Science. Combined with Exploratory Data Analysis (EDA), preparation consumes a significant percentage of time spent by a Data Scientist yet is critical to do correctly. Starting from data that can exist in multiple formats, the data are manipulated until it matches the needs for input to a Machine Learning (ML) model. The modifications may be guided by EDA, which is the process to understand the data, such as calculating the percentage of NULL values, determining basic statistics, and looking at potential correlations with other columns. Notebooks such as Jupyter and Zeppelin are great tools for these exercises, but their integration within larger processing pipelines such as Apache Airflow may not be ideal. Our tool, MLPrE, evolved out of the need for early-stage data preparation and analysis that was consistent and could be repeated by other Data Science team members. This was accomplished through a dataframe storage mechanism with stages used to describe stepwise changes to that dataframe; these stages are described using JavaScript Object Notation (JSON) and these are parsed and used to direct the code to perform steps in a specific order. Currently, there are approximately fifty stages for input/output, filtering, basic statistics, feature engineering, and EDA. MLPrE is Apache Spark based using the Python as the development language.

Authors: David Maxwell (University of Texas M D Anderson Cancer Center), Ya Zhang (University of Texas M D Anderson Cancer Center), James Lomax III (University of Texas M D Anderson Cancer Center), Robert Brown (University of Texas M D Anderson Cancer Center), Brian Dyrud (University of Texas M D Anderson Cancer Center), Melody Page (University of Texas M D Anderson Cancer Center), Mary McGuire (University of Texas M D Anderson Cancer Center), Daniel Wang (University of Texas M D Anderson Cancer Center), and Caroline Chung (University of Texas M D Anderson Cancer Center)

Speakers
DM

David Maxwell

University of Texas M D Anderson Cancer Center


Tuesday October 26, 2021 2:30pm - 2:45pm CDT

2:45pm CDT

Diabetes Management in Underserved Communities: Data-Driven Insights from Continuous Glucose Monitoring
Technical Presentations Group 4: Healthcare

I. MOTIVATION

Continuous glucose monitoring (“CGM”) has proven itself to be beneficial for people with diabetes, providing real-time feedback and clear glucose targets for patients. Unfortunately, we have supporting evidence almost exclusively from White individuals living with type 1 diabetes who are well-educated and can afford health insurance. There is limited understanding of CGM utility for people with type 2 diabetes (“T2D”). This is a massive gap, given T2D accounts for 90-95% of all diabetes cases. This gap is further intensified by two factors. One, there are negligible studies on CGM use in underserved communities, including racial/ethnic minorities who bear a disproportionate burden of the disease. Two, current CGM guidelines are based on summary statistics that smooth out the effect of potentially prognostic glucose patterns observed at different times of the day. We propose a fine-grained analysis of CGM data to discover clinically meaningful physiological and behavioral insights on T2D. These insights can then help design more effective and affordable treatments, which can significantly benefit underserved communities.

II.  HYPOTHESIS

CGMs capture glucose readings every 15 minutes, providing high-resolution temporal information that may detect diabetes onset and progression. Based on prior clinical research on T2D progression, we hypothesize that increasing diabetes risk is associated with: (i) increased glucose abnormalities with distinct patterns during the day vs. overnight, and (ii) bigger glucose surges after meals, most clearly observable after breakfast.

III. METHODS AND RESULTS

We analyzed 2 weeks of CGM data from 119 participants from an underserved community in Santa Barbara, CA (predominantly Hispanic/Latino females, 54·4 ±12·1 years old) stratified into three groups of increasing diabetes risk: (i) 35 normal but at risk of T2D (“at-risk”), (ii) 49 with prediabetes (“pre-T2D”), and (iii) 35 with T2D.

Overnight vs. rest of the day analysis: T2D participants spent significantly higher time in the elevated glucose range of 140-180 mg/dL throughout the day than at-risk and pre-T2D individuals (p<0.0001). Pre-T2D participants, interestingly, spent higher time between 140-180 mg/dL compared to at-risk individuals during the day (p<0.01) but not overnight.

Breakfast analysis: T2D participants had more prominent and more prolonged glucose peaks than the other two groups, with significantly greater height and duration of breakfast glucose peaks than at-risk and pre-T2D participants (p<0.0001 and p<0.01, respectively).

IV. CONCLUSION

We observed a distinct progression of glucose abnormality in a cohort of predominantly Hispanic/Latino individuals at-risk of T2D, those with pre-T2D, and those with T2D. Our results suggest that: (i) disease progression is initially associated with greater glucose excursions during the day and then eventually overnight; and (ii) Glucose peaks after breakfast become taller and take longer to attain with increasing diabetes severity. Both sets of results provide a CGM-based approach to monitoring diabetes progression at home. In the future, we need to validate our findings in longer-duration studies and other populations. Nevertheless, the proposed data-driven measures have the potential to detect diabetes onset early and offer opportunities for new pharmacological and non-pharmacological diabetes treatment regimens that can better benefit underserved communities disproportionately burdened with the disease.

Authors: Souptik Barua (Rice University), Namino Glantz (Sansum Diabetes Research Institute), David Kerr (Sansum Diabetes Research Institute) and Ashutosh Sabharwal (Rice University)

Speakers
SB

Souptik Barua

Rice University


Tuesday October 26, 2021 2:45pm - 3:00pm CDT

3:00pm CDT

Closing Remarks
Speakers
avatar for Angela Wilkins

Angela Wilkins

Executive Director, The Ken Kennedy Institute
Angela Wilkins is the Executive Director of the Ken Kennedy Institute. Angela is responsible for the development and implementation of Ken Kennedy Institute’s programs in the computational sciences. After earning a Ph.D. in theoretical physics from Lehigh University, she shifted... Read More →


Tuesday October 26, 2021 3:00pm - 3:05pm CDT

4:30pm CDT

Outdoor Networking Reception with Sponsors
Join us for an outdoor networking reception on Tuesday, October 26th with sponsors at Holman Draft Hall from 4:30-6:30.

Thank you to our sponsor, DDN, for making this reception possible!

Tuesday October 26, 2021 4:30pm - 6:30pm CDT
Holman Draft Hall 820 Holman St, Houston, TX 77002
 
Wednesday, October 27
 

9:00am CDT

Conference Registration, Breakfast, Networking
Wednesday October 27, 2021 9:00am - 9:45am CDT

10:00am CDT

Welcome
Wednesday October 27, 2021 10:00am - 10:05am CDT
Auditorium

10:05am CDT

Automatic Machine Learning with AutoGluon - Algorithms, Domains, Applications - Auditorium
AutoML is the ultimate challenge for machine learning algorithms. After all, design choices need to be automatic and tools need to work reliably all the time, within a given budget for computation and time. This poses exciting and (many unsolved) problems both in terms of model selection, calibration, optimization, adaptive design of priors, and data detection. In this talk I give an overview of the associated scientific problems and the current state of the art in terms of what goes into AutoGluon.

Speakers
avatar for Alex Smola

Alex Smola

VP and Distinguished Scientist, Amazon Web Services
Alex Smola studied physics in Munich at the University of Technology, Munich and at AT&T Research in Holmdel. He received a Doctoral Degree in computer science at the University of Technology Berlin in 1998. He worked at the Fraunhofer Geselschaft (1996-1999),  NICTA (2004-2008... Read More →


Wednesday October 27, 2021 10:05am - 10:50am CDT
Auditorium

10:50am CDT

Scalable and Sustainable AI Acceleration for Everyone: Hashing Algorithms Train Billion-parameter AI Models on a Commodity CPU faster than Hardware Accelerators - Auditorium
Current Deep Learning (DL) architectures are growing larger to learn from complex datasets. Training and tuning astronomical-sized models are time and energy-consuming and stalls the progress in AI. Industries are increasingly investing in specialized hardware and deep learning accelerators like TPUs and GPUs to scale up the process. It is taken for granted that commodity hardware CPU is incapable of outperforming powerful accelerators such as GPUs in a head-to-head comparison of training large DL models. However, GPUs come with additional concerns: expensive infrastructural change which only few can afford, hard to virtualize, main memory limitations, chip shortage. Furthermore, the energy consumption of current AI training is prohibitively expensive. An article from MIT Technology Review noted that training one Deep Learning model generates more carbon footprint than five cars in their lifetime.

In this talk, I will demonstrate the first algorithmic progress that exponentially reduces the computation cost associated with training neural networks by mimicking the brain's sparsity. We will show how data structures, particularly hash tables, can be used to design an efficient "associative memory" that reduces the number of multiplications associated with the training of the neural networks. Implementation of this algorithm challenges the common knowledge prevailing in the community that specialized processors like GPUs are significantly superior to CPUs for training large neural networks. The resulting algorithm is orders of magnitude cheaper and energy-efficient. Our careful implementations can train Billions of parameter recommendation models on refurbished old generation CPU significantly faster than top-of-the-line TensorFlow alternatives on the most potent A100 GPU clusters. In the end, I will discuss the current and future state of this line of work along with a brief discussion on the planned extensions.

Speakers
avatar for Anshumali Shrivastava

Anshumali Shrivastava

Professor, Rice University; Founder, ThirdAI Corp
Anshumali Shrivastava's research focuses on Large Scale Machine Learning, Scalable and Sustainable Deep Learning, Randomized Algorithms for Big-Data and Graph Mining.


Wednesday October 27, 2021 10:50am - 11:35am CDT
Auditorium

11:35am CDT

Lunch + Networking
Wednesday October 27, 2021 11:35am - 12:30pm CDT

12:30pm CDT

Democratizing Deep Learning with Commodity Hardware: How to Train Large Deep Learning Models on CPU Efficiently with Sparsity - Auditorium
GPUs are expensive, require premium infrastructure, and are hard to virtualize. Furthermore, our models and data are growing faster than GPU memory. The communication cost of distributing the models over GPUs is prohibitively expensive for most workloads.

Wouldn't it be nice if we could train extensive models with commodity CPUs faster than GPUs? CPUs are cheap, well understood, and ubiquitous hardware. The main memory in CPUs can quickly run in Terabytes (TB) with minimum investment. For extensive models, we can fit both the model and the data in the CPU RAM.

This tutorial will focus on a new emerging paradigm of deep learning training using sparsity and hash tables. We will introduce the idea of selectively identifying parameters and sparsity patterns during exercise. We will demonstrate the integration of these algorithms in existing python codes. As a result, we demonstrate significantly superior deep learning capabilities on CPU, making them competitive (or even better) than state-of-the-art packages on some of the best GPUs. If time permits, we will briefly discuss multi-node implementation and some thoughts on how to train outrageously (Tens of billions or more) large models on small commodity clusters.

Speakers
avatar for Anshumali Shrivastava

Anshumali Shrivastava

Professor, Rice University; Founder, ThirdAI Corp
Anshumali Shrivastava's research focuses on Large Scale Machine Learning, Scalable and Sustainable Deep Learning, Randomized Algorithms for Big-Data and Graph Mining.
NM

Nicholas Meisburger

Rice University
SD

Shabnam Daghaghi

Rice University
MY

Minghao Yan

RIce University


Wednesday October 27, 2021 12:30pm - 2:30pm CDT
Auditorium

12:30pm CDT

How to Deal with Volume and Velocity Associated with Hundreds of Terabytes (and Beyond) of Genomics Data? - Room 280
Whole-genome shotgun sequencing (WGS) has enabled numerous breakthroughs in large-scale comparative genomics research. However, the size of genomic datasets has grown exponentially over the last few years.

This tutorial will focus on two new emerging techniques to handle the challenges associated with Volume and Velocity.

1. Repeated and Merged Bloom Filters (RAMBO) for processing hundreds of terabytes of sequence data. We will see how we index 170 TB of bacterial and virus sequences in less than 14 hours on a shared cluster at Rice, allowing searching for similar or anomalous sequences in a few milliseconds.

2. How to subsample high-velocity meta-genomics data, which keep the diversity intact. We will discuss how we can handle data that is generated at a very high rate. We will show how we can have an efficient sampling scheme roughly as fast as random sampling (RS). However, unlike RS, it preserves the diversity of the genomic pool. We will discuss how these techniques can even be pushed to the edge due to their tiny memory requirements.

Some hands-on experience on these two techniques will be provided.

Speakers
BC

Ben Coleman

Rice University
GG

Gaurav Gupta

Rice University
JE

Josh Engels

Rice University
BG

Benito Geordie

Rice University
AJ

Alan Ji

Rice University
JZ

Junyan Zhang (Henry)

Rice University


Wednesday October 27, 2021 12:30pm - 2:30pm CDT
Room 280

2:30pm CDT

Afternoon Break and Networking
Wednesday October 27, 2021 2:30pm - 3:00pm CDT

3:00pm CDT

SeqScreen: Accurate and Sensitive Functional Screening of Pathogenic Sequences via Ensemble Learning - Auditorium
Modern benchtop DNA synthesis techniques and increased concern of emerging pathogens have elevated the importance of screening oligonucleotides for pathogens of concern. However, accurate and sensitive characterization of oligonucleotides is an open challenge for many of the current techniques and ontology-based tools.  To address this gap, we have developed a novel software tool, SeqScreen, that can accurately and sensitively characterize short DNA sequences using a set of curated Functions of Sequences of Concern (FunSoCs), novel functional labels specific to microbial pathogenesis which describe the pathogenic potential of individual proteins.  SeqScreen uses ensemble machine learning models encompassing multi-stage Neural Networks and Support Vector Classifiers which can label query sequences with FunSoCs via an imbalanced multi-class and multi-label classification task with high accuracy. In summary, SeqScreen represents a first step towards a novel paradigm of functionally informed pathogen characterization from genomic and metagenomic datasets. SeqScreen is open-source and freely available for download at: www.gitlab.com/treangenlab/seqscreen

Authors: Advait Balaji, Bryce Kille, Anthony Kappell, Gene Godbold, Madeline Diep, R. A Leo Elworth, Zhiqin Qian, Dreycey Albin, Daniel Nasko, Nidhi Shah, Mihai Pop, Santiago Segarra, Krista Ternus, and Todd Treangen

Speakers
TT

Todd Treangen

Rice University


Wednesday October 27, 2021 3:00pm - 3:15pm CDT
Auditorium

3:15pm CDT

Parallel RRT Algorithm for Robotic Motion Planning
The advent of autonomous technology ranging from self-driving cars to robotic surgery has propelled motion planning algorithms to the forefront of research. The Rapidly-exploring Random Tree (RRT) algorithm is one such example that is used by robots to find a suitable path between two points while avoiding obstacles. It does this by building a search tree rooted at the start point and then grows the tree by randomly generating and connecting nodes in the search space. It then verifies each connection to ensure no collision has taken place. The algorithm terminates when the goal region is searched and returns a valid path through the tree.


Traditionally, RRT is designed to run sequentially on a single thread. Increasing the speed and efficiency of the algorithm would facilitate its use in highly complex realistic scenarios. With the advent of powerful computing machines, it is an opportune time to enhance the performance of these algorithms. This paper presents a novel parallel-RRT motion planning algorithm that performs computationally intensive steps in batches simultaneously on multiple threads. This increases the number of nodes created and collision checked per second hence finding paths faster.


To test the novel algorithm, we recorded the time taken for a car in a two dimensional space to navigate from a start to a goal point while avoiding obstacles in unknown environments. Results proved that the algorithm successfully utilized the additional threads to calculate paths quicker and more efficiently. In terms of speed, the algorithm showed a 2x speedup when using 2 threads and a 2.35x speedup when using 3 threads. In terms of efficiency, which was reflected by the number of connections added to the search tree per second, the algorithm showed a 2.25x increase in efficiency using 2 threads and a 3x increase using 3 threads.


These preliminary results show promise for leveraging parallel implementations of motion planning algorithms. The use of novel parallel algorithms such as that utilized in this paper heralds the progression into a new era of motion planning capabilities and would invigorate current development efforts in robotics and automation.

Authors: Mantej Singh, Rahul Shome, and Lydia Kavraki

Speakers
MS

Mantej Singh

Rice University


Wednesday October 27, 2021 3:15pm - 3:30pm CDT
Auditorium

3:30pm CDT

MaGNET: Uniform Sampling from Deep Generative Network Manifolds without Retraining - Auditorium
Deep Generative Networks (DGNs) are extensively employed in Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and their variants to approximate the manifold structure and the distribution of a training dataset. However, the samples from the data manifold used to train a DGN are often obtained based on preferences, costs, or convenience such that they favor certain modes (c.f., the large fraction of smiling faces in the CelebA dataset or the large fraction of dark-haired individuals in FFHQ). These inconsistencies will be reproduced in any data sampled from the trained DGN, which has far-reaching potential implications for fairness, data augmentation, anomaly detection, domain adaptation, and beyond. In response, we develop a differential-geometry-based technique that, given a trained DGN, adapts its generative process so that the distribution on the data generating manifold is uniform. We prove theoretically and validate experimentally that our technique can be used to produce a uniform distribution on the manifold regardless of the training set distribution.

Authors: Ahmed Imtiaz Humayun, Randall Balestriero, and Richard Baraniuk

Speakers
AI

Ahmed Imtiaz Humayun

Rice University


Wednesday October 27, 2021 3:30pm - 3:45pm CDT
Auditorium

3:45pm CDT

Magnified Convolutional Enrichment Representation Model - Auditorium
Feature representation mathematically characterizes domain entities, which is crucial in machine learning. We designed a dynamic deep model to evaluate the over-representation of a disease and genes as the controlled vocabulary, with leveraging the contexture information with the word embedding and the global enrichment information, to represent the human diseases. The model has been evaluated and demonstrated the good fitness for predicting the associations of complex diseases.
Authors: Guocai Chen, Herbert Chen, Yuntao Yang, Abhisek Mukherjee, Shervin Assassi, Claudio Soto, and Wenjin Zheng

Speakers

Wednesday October 27, 2021 3:45pm - 4:00pm CDT
Auditorium

4:00pm CDT

PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication - Auditorium
Graph Convolutional Networks (GCNs) is the state-of-the-art method for learning graph-structured data. Training large-scale GCNs requires distributed training across multiple accelerators such that each accelerator is able to hold a partitioned subgraph. However, distributed GCN training incurs prohibitive overhead of communicating node features and gradients among partitions for every GCN layer in each training iteration, limiting the achievable training efficiency and model scalability. To this end, we propose PipeGCN, a simple-yet-effective scheme that hides the communication overhead by pipelining inter-partition communication with intra-partition computation. It is non-trivial to pipeline for efficient GCN training, as communicated node features/gradients will become stale and thus can harm the convergence, negating the pipeline benefit. Notably, little is known regarding the convergence rate of GCN training with stale features. This work not only provides a theoretical convergence guarantee but also finds the convergence rate of PipeGCN to be close to that of the vanilla distributed GCN training without pipeline. Furthermore, we develop a smoothing method to further improve PipeGCN's convergence. Extensive experiments show that PipeGCN can largely boost training throughput (up to 2.2×) while achieving the same accuracy as its vanilla counterpart and that PipeGCN also outperforms existing full-graph training methods.

Authors: Cheng Wan, Youjie Li, Cameron Wolfe, Anastasios Kyrillidis, Nam Kim, and Yingyan Lin

Speakers
CW

Cheng Wan

Rice University


Wednesday October 27, 2021 4:00pm - 4:15pm CDT
Auditorium

4:15pm CDT

Quantification of Myxococcus Xanthus Aggregation and Rippling Behaviors: Deep-Learning Transformation of Phase-Contrast into Fluorescence Microscopy Images - Auditorium
Myxococcus xanthus bacteria are a model system for understanding pattern formation and collective cell behaviors. When starving, cells aggregate into fruiting bodies to form metabolically inert spores. During predation, cells self-organize into traveling cell-density waves termed ripples. Both phase-contrast and fluorescence microscopy are used to observe these patterns but each has its limitations. Phase-contrast images have higher contrast, but the resulting image intensities lose their correlation with cell density. The intensities of fluorescence microscopy images, on the other hand, are well-correlated with cell density, enabling better segmentation of aggregates and better visualization of streaming patterns in between aggregates; however, fluorescence microscopy requires the engineering of cells to express fluorescent proteins and can be phototoxic to cells. To combine the advantages of both imaging methodologies, we develop a generative adversarial network that converts phase-contrast into synthesized fluorescent images. By including an additional histogram-equalized output to the state-of-the-art pix2pixHD algorithm, our model generates accurate images of aggregates and streams, enabling the estimation of aggregate positions and sizes, but with small shifts of their boundaries. Further training on ripple patterns enables accurate estimation of the rippling wavelength. Our methods are thus applicable for many other phenotypic behaviors and pattern formation studies.

Authors: Jiangguo Zhang, Jessica Comstock, Christopher Cotter, Patrick Murphy, Weili Nie, Roy Welch, Ankit Patel, and Oleg Igoshin

Speakers
JZ

Jiangguo Zhang

Rice University


Wednesday October 27, 2021 4:15pm - 4:30pm CDT
Auditorium

4:30pm CDT

Posters and Networking - Exhibit Hall
COVID-19 Chest X-Ray Image Classification Using Deep Learning: Soumava Dey (American International Group Inc.), Gunther Correia Bacellar (Microsoft), Mallikarjuna Chandrappa (Bank Of America) and Rajlakshman Kulkarni (Bank Of America)

Localization for Autonomous Underwater Vehicles Inside GPS-Denied Environments: Issam Ben Moallem (Rice University), Ashesh Chattopadhyay (Rice University), Pedram Hassanzadeh (Rice University) and Fathi H. Ghorbel (Rice University)

An Open-Data Driven Risk Assessment Metric for Covid-19 in Texas by County: A Correlation Study Among Possible Risk Factors and an Elementary Unsupervised Machine Learning Analysis: Archita Singh (Cypress Falls High School), Swapnil Shaurya (University of Texas at Austin) and Antony Adair (MD Anderson Cancer Center/UT Health Graduate School of Biomedical Sciences)

Denoising the Fast Monte Carlo Voxel Level Dose Distributions in Proton Beam Radiation Therapy: A Study to Decrease the Computation Time Required: Sanat Dubey (Westwood High School), Antony Adair (MD Anderson Cancer Center/UT Health Graduate School of Biomedical Sciences) and Pablo Yepes (Rice University)

Wednesday October 27, 2021 4:30pm - 5:30pm CDT
 
Tuesday, November 30
 

10:00am CST

Data4Good & Responsible AI-Automated Data Collection: How Do You Ensure Data4good?
POST CONFERENCE WORKSHOP | FREE

Data4Good & Responsible AI-Automated Data Collection: How Do You Ensure Data4good?

Watch workshop recording HERE.

Web data gathering, especially smart, AI-driven, data retrieving, cleansing, normalization, and aggregation solutions, can significantly reduce the amount of time and resources that organizations have to invest in data collection and preparation.

Though web data collection has existed for a long time, the use of AI for web data gathering has become a game-changer.

As methods for the data domain multiply, so do calls to ensure that these methods are used “for good,” or at the very least, ethically. But how do we know if we are achieving “good”?

Microsoft Israel R&D Centre's Chief Scientist Dr. Tomer Simon alongside Bright Data's CEO Or Lenchner will explore the different questions raised when approaching data at a mammoth scale. They will also discuss interesting cases that use and leverage data for battling climate change, fighting social injustice, and even saving lives.

The focus of this workshop is to champion a "do no harm" approach when accessing and approaching data using AI and take a closer look at the ethical, compliance-driven processes and questions one must address when doing so, even when approaching what is considered to be "public domain data."

During this workshop, we will also introduce a set of questions we created that can assist any organization in the process and provide different real-life examples of data being used and tested for good.

Speakers
avatar for Dr. Tomer Simon

Dr. Tomer Simon

Chief Scientist, Microsoft
avatar for Or Lenchner

Or Lenchner

CEO, Bright Data


Tuesday November 30, 2021 10:00am - 11:30am CST