In recent years, machine/deep learning algorithms has unprecedentedly improved the accuracies in practical recognition and classification tasks, some even surpassing human-level accuracy. While significant progresses have been made on accelerating the models for real-time inference on edge and mobile devices, the training of the models largely remains offline on server side. State-of-the-art learning algorithms for deep neural networks (DNN) imposes significant challenges for hardware implementations in terms of computation, memory, and communication. This is especially true for edge devices and portable hardware applications, such as smartphones, machine translation devices, and smart wearable devices, where severe constraints exist in performance, power, and area.
There is a timely need to map the latest complex learning algorithms to custom hardware, in order to achieve orders of magnitude improvement in performance, energy efficiency and compactness. Exemplary efforts from industry and academia include many application-specific hardware designs (e.g., xPU, FPGA, ASIC, etc.). Recent progress in computational neurosciences and nanoelectronic technology, such as emerging memory devices, will further help shed light on future hardware-software platforms for learning on-a-chip. At the same time new learning algorithms need to be developed to fully explore the potential of the hardware architecture.
The overarching goal of this workshop is to explore the potential of on-chip machine learning, to reveal emerging algorithms and design needs, and to promote novel applications for learning. It aims to establish a forum to discuss the current practices, as well as future research needs in the aforementioned fields.
Key Topics
Synaptic plasticity and neuron motifs of learning dynamics
Computation models of cortical activities
Sparse learning, feature extraction and personalization
Deep learning with high speed and high power efficiency
Hardware acceleration for machine learning
Hardware emulation of brain
Nanoelectronic devices and architectures for neuro-computing
Applications of learning on a smart mobile platform
Speakers
Keynote
Mike Davies, Intel
Invited Speakers
Nathan McDonald, Air Force Research Lab
Priya Panda, Yale University
Travis Dewolf, Applied Brain Research
Hai (Helen) Li, Duke University
Deming Chen, University of Illinois Urbana-Champaign
Yiyu Shi, University of Notre Dame
Yanzhi Wang, Northeastern University
Eriko Nurvitadhi, Intel
Preliminary Program
8:20am — 8:30am
Introduction and opening remarks
8:30am — 9:15am
——— Keynote talk ———
Keynote
Mike Davies is Director of Intel’s Neuromorphic Computing Lab. Since joining Intel Labs in 2014, Mike has researched neuromorphic architectures, algorithms, software, and systems, and has fabricated several neuromorphic chip prototypes to date. His group is responsible for Intel’s Loihi research chip. Previously, as a founding employee of Fulcrum Microsystems and its Director of Silicon Engineering, Mike pioneered high performance asynchronous design methodologies as applied to several generations of industry-leading Ethernet switch products. He joined Intel in 2011 by Intel’s acquisition of Fulcrum.
Abstract:
Intel’s Loihi neuromorphic chip has a growing body of results demonstrating that neuromorphic architectures can deliver order of magnitude gains in energy efficiency and computational latency for a wide range of workloads compared to conventional CPUs and GPUs. Meanwhile, new learning algorithms inspired from neuroscience show a path to orders of magnitude more efficient learning compared to deep learning approaches. Some of these are running on Loihi today, while others require ongoing neuromorphic innovations before they can be realized in hardware. This talk shares some of these results and perspectives.
Mike Davies (Intel)
——— Session 1: Architecture and Algorithm for On-Chip-Learning ———
Section Chair: Qinru Qiu
9:15am — 9:40am
Invited Talk
Nathan McDonald is a researcher at the Air Force Research Laboratory Information Directorate, AFRL/RI. He earned a masters in nanoscale engineering from the College of Nanoscale Science and Engineering (CNSE), SUNY at Albany in 2012 and bachelors in physics in 2008. His primary area of research is machine learning algorithms and hardware for size, weight, and power (SWaP) limited systems, which includes publications across diverse areas as memristors, optical reservoir computing, and hyperdimensional computing.
Abstract:
Machine learning (ML) research has been understandably dominated by real-valued artificial neural networks (ANN). However, these algorithms are not efficiently mapped to traditional computing hardware. To push ML to resource limited Internet of Things (IoT) devices, much research has focused on spiking neural networks (SNN) running on specialized hardware. But ML for IoT need not exclusively consider the neural network framework. Hyperdimensional computing with very large binary vectors can be both efficiently mapped to traditional hardware and, through simple mathematical operations, affords online learning in the field. This talk will examine various applications to expeditionary robotic tasks including transfer learning in the field, non-trivially cloning one trained robot into a swarm, navigation despite extraneous sensors, and updating behavior of trained swarms in the field.
Nathan McDonald (Air Force Research Lab)
9:40am — 10:05am
Invited Talk
Priya Panda is an assistant professor in the electrical engineering department at Yale University, USA. She received her PhD in 2019 from Purdue University. She is the recipient of the 2019 Amazon Research Award. Her research interests include neuromorphic computing, deploying robust and energy efficient machine intelligence.
Abstract:
Spiking Neural Networks (SNNs) offer an energy-efficient alternative for implementing deep learning applications. In recent years, there have been several proposals focused on supervised (conversion, spike-based gradient descent) training methods to improve the accuracy of SNNs on large-scale tasks. However, each of these methods suffer from scalability, latency, and accuracy limitations. I will talk about certain algorithmic techniques of modifying the SNN configuration with backward residual connections, and hybrid artificial-and-spiking neuronal activations to improve the learning ability of the training methodologies to yield competitive accuracy, while, yielding large efficiency gains over their artificial counterparts. Further, I will discuss our recent work on using a temporal Batch Normalization Through Time (BNTT) technique. Most prior SNN works till now have disregarded batch normalization deeming it ineffective for training temporal SNNs. BNTT allows us to train deep SNN architectures from scratch, for the first time, on complex datasets with just few 25-30 time-steps. Finally, I will delve into the usefulness of analog memristive crossbars that perform Matrix-Vector-Multiplications (MVMs) efficiently with low energy and area requirements for adversarially robust neural networks. Crossbars generally suffer from intrinsic non-idealities that cause errors in performing MVMs, leading to degradation in the accuracy of neural networks. l will discuss how the intrinsic hardware variations manifested through crossbar non-idealities yield adversarial robustness to the mapped models without any additional optimization.
Priya Panda (Yale University)
10:05am — 10:30am
Invited Talk
Travis Dewolf is a co-founder of Applied Brain Research, where he lead the autonomous systems group and develop neurorobotic systems using spiking neural networks and neuromorphic hardware. He received his Ph.D in systems design engineering with a focus in computational neuroscience from the University of Waterloo, where he was a member of the Computational Neuroscience Research Group. His focus was on modeling the motor control system, studying the planning, execution, and adaptation mechanisms that drive movement. Please visit his research blog to see more his work.
Abstract:
In this talk, I will discuss some of the neurorobotics research we've done at Applied Brain Research using our neural development platform Nengo and Intel's neuromorphic Loihi chip. I will present two systems: 1) A simulated rover controller that combines a deep neural network converted to a spiking neural network to locate a target with a control network developed using the Neural Engineering Framework to drive the robot to the target. In this example I will show how non-experts can quickly develop end-to-end spiking neural networks using deep learning techniques and mechanistic neural circuits. 2) A non-linear adaptive force-based control system that we use to control a real-world Kinova Jaco2 6-DOF arm in a reaching task. In this example I will highlight how we are able to take advantage of the Loihi's on-chip learning to achieve better performance with lower latency and power costs than the same implementation on standard hardware.
Travis Dewolf (Applied Brain Research)
10:30am — 10:55am
Invited Talk
Hai “Helen” Li is Clare Boothe Luce Professor and Associate Chair for Operations of the Department of Electrical and Computer Engineering at Duke University. She received her B.S and M.S. from Tsinghua University and Ph.D. from Purdue University. At Duke, she co-directs Duke University Center for Computational Evolutionary Intelligence and NSF IUCRC for Alternative Sustainable and Intelligent Computing (ASIC). Her research interests include machine learning acceleration and security, neuromorphic circuit and system for brain-inspired computing, conventional and emerging memory design and architecture, and software and hardware co-design. She received the NSF CAREER Award, the DARPA Young Faculty Award, TUM-IAS Hans Fischer Fellowship from Germany, ELATE Fellowship, eight best paper awards and another nine best paper nominations. Dr. Li is a fellow of IEEE and a distinguished member of ACM. For more information, please see her webpage.
Abstract:
Following technology advances in high-performance computation systems and fast growth of data acquisition, machine learning, especially deep neural networks (DNNs), made remarkable success in many research areas and application domains. Such a success, to a great extent, is enabled by developing large-scale network models that learn from a huge volume of data. Though the research on hardware acceleration for neural networks has been extensively studied, the progress of hardware development still falls far behind the upscaling of DNN models at the software level. Thus, the efficient deployment of DNN models emerges as a major challenge. For example, the massive number of parameters and high computation demand make it difficult to deploy state-of-the-art DNNs onto resource-constrained devices. Compared to inference, training a DNN is much more complicated and has more significant computation and communication intensity. We envision that software/hardware co-design for efficient deep learning is necessary. In this talk, I will start with the trends of machine learning study, followed by our latest explorations on DNN model compression, architecture search, distributed learning, and corresponding optimization at the hardware level.
Hai (Helen) Li (Duke University)
10:55am — 11:10am
Discussion
——— Session 2: Intelligent Mobile Applications: learning and inference ———
Section Chair: Yingyan Lin
11:10am — 11:35am
Invited Talk
Deming Chen obtained his BS in computer science from University of Pittsburgh, Pennsylvania in 1995, and his MS and PhD in computer science from University of California at Los Angeles in 2001 and 2005 respectively. He joined the ECE department of University of Illinois at Urbana-Champaign in 2005. His current research interests include reconfigurable computing, machine learning and cognitive computing, system-level and high-level synthesis, and hardware security. He has given more than 110 invited talks sharing these research results worldwide. He is the Donald Willett Faculty Scholar and the Abel Bliss Professor of the Grainger College of Engineering, an IEEE Fellow, an ACM Distinguished Speaker, and the Editor-in-Chief of ACM Transactions on Reconfigurable Technology and Systems (TRETS).
Abstract:
Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs. However, the trade-off between the quantization bitwidth and final accuracy is complex and non-convex, which makes it difficult to be optimized directly. Minimizing direct quantization loss (DQL) of the coefficient data is an effective local optimization method, but previous works often neglect the accurate control of the DQL, resulting in a higher loss of the final DNN model accuracy. In this talk, we propose a novel metric called Vector Loss. Based on this new metric, we develop a new quantization solution called VecQ, which achieves minimal direct quantization loss and better model accuracy. In addition, in order to speed up the proposed quantization process during model training, we accelerate the quantization process with a parameterized probability estimation method and template-based derivation calculation. We evaluate our proposed algorithm on MNIST, CIFAR, ImageNet, IMDB movie review and THUCNews text data sets with numerical DNN models. The results demonstrate that our proposed quantization solution is more accurate and effective than the state-of-the-art approaches yet with more flexible bitwidth support, 16x weight size reduction.
Deming Chen (University of Illinois Urbana-Champaign)
11:35am — 12:00pm
Invited Talk
Yiyu Shi is currently an associate professor in the Department of Computer Science and Engineering at the University of Notre Dame, the site director of NSF I/UCRC Alternative and Sustainable Intelligent Computing, and the director of the Sustainable Computing Lab (SCL). He is also a visiting scientist at Boston Children’s Hospital, the primary pediatric program of Harvard Medical School. He received his B.S. in Electronic Engineering from Tsinghua University, Beijing, China in 2005, the M.S and Ph.D. degree in Electrical Engineering from the University of California, Los Angeles in 2007 and 2009 respectively. His current research interests focus on hardware intelligence and biomedical applications. In recognition of his research, more than a dozen of his papers have been nominated for or awarded as the best paper in top conferences. He was also the recipient of IBM Invention Achievement Award, Japan Society for the Promotion of Science (JSPS) Faculty Invitation Fellowship, Humboldt Research Fellowship, IEEE St. Louis Section Outstanding Educator Award, Academy of Science (St. Louis) Innovation Award, Missouri S&T Faculty Excellence Award, NSF CAREER Award, IEEE Region 5 Outstanding Individual Achievement Award, the Air Force Summer Faculty Fellowship, and IEEE Computer Society Mid-Career Research Achievement Award. He has served on the technical program committee of many international conferences. He is on the executive committee of ACM SIGDA, deputy editor-in-chief of IEEE VLSI CAS Newsletter, and an associate editor of various IEEE and ACM journals.
Abstract:
The prevalence of deep neural networks today is supported by a variety of powerful hardware platforms including GPUs, FPGAs, and ASICs. A fundamental question lies in almost every implementation of deep neural networks: given a specific task, what is the optimal neural architecture and the tailor-made hardware in terms of accuracy and efficiency? Earlier approaches attempted to address this question through hardware-aware neural architecture search (NAS), where features of a fixed hardware design are taken into consideration when designing neural architectures. However, we believe that the best practice is through the simultaneous design of the neural architecture and the hardware to identify the best pairs that maximize both test accuracy and hardware efficiency. In this talk, we will present novel co-exploration frameworks for neural architecture and various hardware platforms including FPGA, NoC, ASIC and Computing-in-Memory, all of which are the first in the literature. We will demonstrate that our co-exploration concept greatly opens up the design freedom and pushes forward the Pareto frontier between hardware efficiency and test accuracy for better design trade-offs.
Yiyu Shi (University of Notre Dame)
12:00pm — 12:25pm
Invited Talk
Yanzhi Wang is currently an assistant professor at Dept. of ECE at Northeastern University, Boston, MA. He received the B.S. degree from Tsinghua University in 2009, and Ph.D. degree from University of Southern California in 2014. His research interests focus on model compression and platform-specific acceleration of deep learning applications. His research maintains the highest model compression rates on representative DNNs since 09/2018. His work on AQFP superconducting based DNN acceleration is by far the highest energy efficiency among all hardware devices. His recent research achievement, CoCoPIE, can achieve real-time performance on almost all deep learning applications using off-the-shelf mobile devices, outperforming competing frameworks by up to 180x acceleration.
His work has been published broadly in top conference and journal venues (e.g., DAC, ICCAD, ASPLOS, ISCA, MICRO, HPCA, PLDI, ICS, PACT, ISSCC, AAAI, ICML, CVPR, ICLR, IJCAI, ECCV, ICDM, ACM MM, FPGA, LCTES, CCS, VLDB, PACT, ICDCS, Infocom, C-ACM, JSSC, TComputer, TCAS-I, TCAD, TCAS-I, JSAC, TNNLS, etc.), and has been cited above 7,500 times. He has received four Best Paper Awards, has another ten Best Paper Nominations and four Popular Paper Awards. He has received the ARO Young Investigator Program Award (YIP), Massachusetts Acorn Innovation Award, and other research awards from Google, MathWorks, etc. Three of his former Ph.D./postdoc students become tenure track faculty at Univ. of Connecticut, Clemson University, and Texas A&M University, Corpse Christi.
Abstract:
Mobile and embedded computing devices have become key carriers of deep learning to facilitate the widespread of machine intelligence. However, there is a widely recognized challenge to achieve real-time DNN inference on edge devices, due to the limited computation/storage resources on such devices. Model compression of DNNs, including weight pruning and weight quantization, has been investigated to overcome this challenge. However, current work on DNN compression suffer from the limitation that accuracy and hardware performance are somewhat conflicting goals difficult to satisfy simultaneously.
In this talk, we present our recent work CoCoPIE, representing Compression-Compilation Codesign, to overcome this limitation towards the best possible DNN acceleration on edge devices. We propose novel fine-grained structured pruning schemes, including pattern-based pruning, block-based pruning, etc. They can simultaneously achieve high hardware performance (similar to filter/channel pruning) while maintaining zero accuracy loss, with the help of compiler, which is beyond the capability of prior work. Similarly, we present novel quantization scheme that achieves ultra-high hardware performance close to 2-bit weight quantization, with almost no accuracy loss. Through the CoCoPIE framework, we are able to achieve real-time on-device execution of a number of DNN tasks, including object detection, pose estimation, activity detection, speech recognition, just using an off-the-shelf mobile device, with up to 180x speedup compared with prior work.
Yanzhi Wang (Northeastern University)
12:25pm — 12:50pm
Invited Talk
Eriko Nurvitadhi is a senior research scientist at the CTO office of Programmable Solutions Group at Intel. He leads FPGA-related external/academic and internal research programs. Prior to that, he started and grew FPGA research at Intel Labs, and managed the FPGA research lab. His research focuses on hardware accelerator architectures (e.g., FPGAs, ASICs) for AI and data analytics. He has over 50 academic publications and 15 patents issued in this area. His research has contributed to Intel’s FPGA and ASIC solutions for AI. At Intel, he has received awards for his contributions to co-founding and growing the Xeon+FPGA academic program (HARP), as well as to next-generation FPGA technology. He received his PhD in Electrical and Computer Engineering from Carnegie Mellon University.
Abstract:
The continued rapid growth of data, along with advances in Artificial Intelligence (AI), is reshaping the computing ecosystem landscape. The data-intensive nature of AI requires minimizing data movement. Furthermore, interactive intelligent services require scalable and real-time solutions to provide a compelling user experience. Finally, algorithmic innovations in AI demand a flexible and programmable computing platform to adapt with this rapidly changing field.
We believe that these trends present tremendous opportunities for FPGAs, which are a natural substrate to provide a programmable, near-data, real-time, and scalable platform for AI analytics. FPGAs are already embedded in several places to perform computation as data flows throughout the computing ecosystem (e.g., “smart” network/storage, near image/audio sensors). As AI becomes more pervasively tied into general-purpose computation, FPGAs are in excellent position to offer flexible computing synergistically with AI. Intel FPGAs are System-in-Package (SiP), scalable with variety of chiplets to complement the FPGA chip. They are also scalable at datacenter-scale as reconfigurable cloud, enabling real-time AI services. Using soft processor overlays, FPGAs can be programmed through software without needing full EDA tool runs each time.
In this talk, we first discuss the current trends in AI and big data. We then present trends in FPGA and opportunities for FPGAs in the era of AI and big data. Finally, we discuss our recent research in this space.
Eriko Nurvitadhi (Intel)
12:50pm — 1:05pm
Discussion
Organizing Committee
Co-chairs
Qinru Qiu (Syracuse University)
Yingyan Lin (Rice University)
Chenchen Liu (University of Maryland, Baltimore County)
Steering Committee
Yu Cao, Arizona State University
Xin Li, Duke University
Jae-sun Seo, Arizona State University
Technical Program Committee
Rob Aitken, ARM
Shawn Blanton, Carnegie Mellon University
Sankar Basu, National Science Foundation
Yiran Chen, Duke University
Kailash Gopalakrishnan, IBM
Yiorgos Makris, University of Texas, Dallas
Kendel McCarley, Raytheon Company
Mustafa Ozdal, Bilkent University, Turkey
Yanzhi Wang, Northeastern University
Yuan Xie, University of California, Santa Barbara
Jishen Zhao, University of California at San Diego