Authors:
(1) Xiaofan Yu, University of California San Diego, La Jolla, California, USA (x1yu@ucsd.edu);
(2) Anthony Thomas, University of California San Diego, La Jolla, California, USA (ahthomas@ucsd.edu);
(3) Ivannia Gomez Moreno, CETYS University, Campus Tijuana, Tijuana, Mexico (ivannia.gomez@cetys.edu.mx);
(4) Louis Gutierrez, University of California San Diego, La Jolla, California, USA (l8gutierrez@ucsd.edu);
(5) Tajana Šimunić Rosing, University of California San Diego, La Jolla, USA (tajana@ucsd.edu).
Table of Links
8 Evaluation of LifeHD semi and LifeHDa
9 Discussions and Future Works
10 Conclusion, Acknowledgments, and References
2 RELATED WORK
Lifelong and On-Device Learning. Lifelong learning (or continual learning) is a large and active area of research in the broader machine learning community. Catastrophic forgetting is a major challenge in lifelong learning, and refers to a commonly observed empirical phenomenon in which updating certain machine learning models with new data severely degrades their ability to perform previously learned tasks [36]. Previous works proposed techniques such as dynamic architecture [31, 49], regularization by penalizing important weights [28, 66], knowledge distillation from past models [14] and experience replay using a memory buffer [35, 58]. The lifelong learning literature has examined a wide range of problem settings, ranging from the fully supervised case, in which tasks and class labels are provided, and the fully unsupervised case without any labels and prior knowledge [13, 57]. However, all of these works are based on deep NNs and require backpropagation, which is problematic for resource-constrained devices.
Neurally-inspired lightweight algorithms have recently been proposed for lifelong learning applications. FlyModel [52] and SDMLP [8] use sparse coding and associative memory for lifelong learning. However, both approaches assume full supervision. STAM [54] is an expandable memory architecture for unsupervised lifelong learning, using layered receptive fields and a two-tier memory hierarchy. It learns via online centroid-based clustering pipeline, novelty detection and memory updates. Nevertheless, the memory in STAM is solely dedicated to image storage, while our LifeHD additionally emphasizes merging past patterns into coarse groups and shows more effective learning performance.
Recent works optimize the resource usage of on-device training via pruning and quantization [34, 45], tuning partial weights [9, 47], memory profiling and optimization [15, 61, 64], as well as growing the NN on the fly [67]. All these works optimize training given resource constraints and do not focus on lifelong learning. They are orthogonal to the contribution of LifeHD which focuses on adaptive and continual training. LifeHD can be further optimized by combining with such techniques.
Hyperdimensional Computing. HDC has garnered substantial interest from the computer hardware community as an energyefficient and low-latency approach to learning, and has been successfully applied to problems such as human activity recognition [27], voice recognition [23], image recognition [11, 65], to name a few. The large majority of literature on HDC has focused on using the technique to perform supervised classification tasks. Among the limited literature for weakly-supervised learning with HDC, HDCluster [21] enabled unsupervised clustering in HDC with a new algorithm that is similar to K-Means. SemiHD [22] is a semi-supervised learning framework using HDC with iterative self-labeling. Hyperseed [41], C-FSCIL [18] and FSL-HD [65] adopted HDC or similar vector symbolic architectures (VSA) for unsupervised or few-shot learning. All above works did not consider the lifelong aspect and used offline training on a static dataset. To the best of the authors’ knowledge, LifeHD is the first work that designs and deploys lifelong learning in edge IoT applications especially with zero or minimal amount of labels.
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.