SEEC: Semantic Vector Federation across Edge Computing Environments Shalisha Witherspoon∗, Dean Steuer∗, Graham Benty and Nirmit Desai∗ ∗IBM Research, Yorktown Heights, NY, USA Email:
[email protected],
[email protected],
[email protected] yIBM Research, Hursley, Winchester, UK Email:
[email protected] Abstract—Semantic vector embedding techniques have proven However, previous work on federated learning has not useful in learning semantic representations of data across mul- investigated machine learning tasks beyond classification and tiple domains. A key application enabled by such techniques is prediction. Specifically, representation learning and semantic the ability to measure semantic similarity between given data samples and find data most similar to a given sample. State-of- vector embedding techniques have proven effective across a the-art embedding approaches assume all data is available on a variety of machine learning tasks across multiple domains. For single site. However, in many business settings, data is distributed text data, sentence and paragraph embedding techniques such across multiple edge locations and cannot be aggregated due to a as doc2vec [15], GloVe [19], and BERT [4] have led to highly variety of constraints. Hence, the applicability of state-of-the-art accurate language models for a variety of NLP tasks. Similar embedding approaches is limited to freely shared datasets, leaving out applications with sensitive or mission-critical data. This paper results have been achieved in graph learning tasks [7], [28] addresses this gap by proposing novel unsupervised algorithms and image recognition tasks [18], [6]. Key reasons behind the called SEEC for learning and applying semantic vector embed- effectiveness of semantic embedding techniques include their ding in a variety of distributed settings.