arXiv Paper Daily: Thu, 23 Jan 2020
source link: https://www.52ml.net/22441.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Neural and Evolutionary Computing
Learning Directed Locomotion in Modular Robots with Evolvable Morphologies
Comments: 30 pages, 14 figures
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI)
We generalize the well-studied problem of gait learning in modular robots in
two dimensions. Firstly, we address locomotion in a given target direction that
goes beyond learning a typical undirected gait. Secondly, rather than studying
one fixed robot morphology we consider a test suite of different modular
robots. This study is based on our interest in evolutionary robot systems where
both morphologies and controllers evolve. In such a system, newborn robots have
to learn to control their own body that is a random combination of the bodies
of the parents. We apply and compare two learning algorithms, Bayesian
optimization and HyperNEAT. The results of the experiments in simulation show
that both methods successfully learn good controllers, but Bayesian
optimization is more effective and efficient. We validate the best learned
controllers by constructing three robots from the test suite in the real world
and observe their fitness and actual trajectories. The obtained results
indicate a reality gap that depends on the controllers and the shape of the
robots, but overall the trajectories are adequate and follow the target
directions successfully.
Automatic phantom test pattern classification through transfer learning with deep neural networks
Rafael B. Fricks , Justin Solomon , Ehsan Samei Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Medical Physics (physics.med-ph)
Imaging phantoms are test patterns used to measure image quality in computer
tomography (CT) systems. A new phantom platform (Mercury Phantom, Gammex)
provides test patterns for estimating the task transfer function (TTF) or noise
power spectrum (NPF) and simulates different patient sizes. Determining which
image slices are suitable for analysis currently requires manual annotation of
these patterns by an expert, as subtle defects may make an image unsuitable for
measurement. We propose a method of automatically classifying these test
patterns in a series of phantom images using deep learning techniques. By
adapting a convolutional neural network based on the VGG19 architecture with
weights trained on ImageNet, we use transfer learning to produce a classifier
for this domain. The classifier is trained and evaluated with over 3,500
phantom images acquired at a university medical center. Input channels for
color images are successfully adapted to convey contextual information for
phantom images. A series of ablation studies are employed to verify design
aspects of the classifier and evaluate its performance under varying training
conditions. Our solution makes extensive use of image augmentation to produce a
classifier that accurately classifies typical phantom images with 98% accuracy,
while maintaining as much as 86% accuracy when the phantom is improperly
imaged.
Accelerating supply chains with Ant Colony Optimization across range of hardware solutions
Ivars Dzalbs , Tatiana Kalganova Subjects : Artificial Intelligence (cs.AI) ; Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)
Ant Colony algorithm has been applied to various optimization problems,
however most of the previous work on scaling and parallelism focuses on
Travelling Salesman Problems (TSPs). Although, useful for benchmarks and new
idea comparison, the algorithmic dynamics does not always transfer to complex
real-life problems, where additional meta-data is required during solution
construction. This paper looks at real-life outbound supply chain problem using
Ant Colony Optimization (ACO) and its scaling dynamics with two parallel ACO
architectures – Independent Ant Colonies (IAC) and Parallel Ants (PA). Results
showed that PA was able to reach a higher solution quality in fewer iterations
as the number of parallel instances increased. Furthermore, speed performance
was measured across three different hardware solutions – 16 core CPU, 68 core
Xeon Phi and up to 4 Geforce GPUs. State of the art, ACO vectorization
techniques such as SS-Roulette were implemented using C++ and CUDA. Although
excellent for TSP, it was concluded that for the given supply chain problem
GPUs are not suitable due to meta-data access footprint required. Furthermore,
compared to their sequential counterpart, vectorized CPU AVX2 implementation
achieved 25.4x speedup on CPU while Xeon Phi with its AVX512 instruction set
reached 148x on PA with Vectorized (PAwV). PAwV is therefore able to scale at
least up to 1024 parallel instances on the supply chain network problem solved.
Comments: 7 pages, 6 figures
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Existing graph neural networks may suffer from the “suspended animation
problem” when the model architecture goes deep. Meanwhile, for some graph
learning scenarios, e.g., nodes with text/image attributes or graphs with
long-distance node correlations, deep graph neural networks will be necessary
for effective graph representation learning. In this paper, we propose a new
graph neural network, namely DIFNET (Graph Diffusive Neural Network), for graph
representation learning and node classification. DIFNET utilizes both neural
gates and graph residual learning for node hidden state modeling, and includes
an attention mechanism for node neighborhood information diffusion. Extensive
experiments will be done in this paper to compare DIFNET against several
state-of-the-art graph neural network models. The experimental results can
illustrate both the learning performance advantages and effectiveness of
DIFNET, especially in addressing the “suspended animation problem”.
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
Comments: arXiv admin note: text overlap with arXiv:1909.05073
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Weight pruning has been widely acknowledged as a straightforward and
effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby
achieving acceleration on various platforms. However, most of the pruning
techniques are essentially trade-offs between model accuracy and regularity
which lead to impaired inference accuracy and limited on-device acceleration
performance. To solve the problem, we introduce a new sparsity dimension,
namely pattern-based sparsity that comprises pattern and connectivity sparsity,
and becoming both highly accurate and hardware friendly. With carefully
designed patterns, the proposed pruning unprecedentedly and consistently
achieves accuracy enhancement and better feature extraction ability on
different DNN structures and datasets, and our pattern-aware pruning framework
also achieves pattern library extraction, pattern selection, pattern and
connectivity pruning and weight training simultaneously. Our approach on the
new pattern-based sparsity naturally fits into compiler optimization for highly
efficient DNN execution on mobile platforms. To the best of our knowledge, it
is the first time that mobile devices achieve real-time inference for the
large-scale DNN models thanks to the unique spatial property of pattern-based
sparsity and the help of the code generation capability of compilers.
Computer Vision and Pattern Recognition
RDAnet: A Deep Learning Based Approach for Synthetic Aperture Radar Image Formation
Comments: 8 pages, 5 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Synthetic Aperture Radar (SAR) imaging systems operate by emitting radar
signals from a moving object, such as a satellite, towards the target of
interest. Reflected radar echoes are received and later used by image formation
algorithms to form a SAR image. There is great interest in using SAR images in
computer vision tasks such as automatic target recognition. Today, however, SAR
applications consist of multiple operations: image formation followed by image
processing. In this work, we show that deep learning can be used to train a
neural network able to form SAR images from echo data. Results show that our
neural network, RDAnet, can form SAR images comparable to images formed using a
traditional algorithm. This approach opens the possibility to end-to-end SAR
applications where image formation and image processing are integrated into a
single task. We believe that this work is the first demonstration of deep
learning based SAR image formation using real data.
Automatic phantom test pattern classification through transfer learning with deep neural networks
Rafael B. Fricks , Justin Solomon , Ehsan Samei Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Medical Physics (physics.med-ph)
Imaging phantoms are test patterns used to measure image quality in computer
tomography (CT) systems. A new phantom platform (Mercury Phantom, Gammex)
provides test patterns for estimating the task transfer function (TTF) or noise
power spectrum (NPF) and simulates different patient sizes. Determining which
image slices are suitable for analysis currently requires manual annotation of
these patterns by an expert, as subtle defects may make an image unsuitable for
measurement. We propose a method of automatically classifying these test
patterns in a series of phantom images using deep learning techniques. By
adapting a convolutional neural network based on the VGG19 architecture with
weights trained on ImageNet, we use transfer learning to produce a classifier
for this domain. The classifier is trained and evaluated with over 3,500
phantom images acquired at a university medical center. Input channels for
color images are successfully adapted to convey contextual information for
phantom images. A series of ablation studies are employed to verify design
aspects of the classifier and evaluate its performance under varying training
conditions. Our solution makes extensive use of image augmentation to produce a
classifier that accurately classifies typical phantom images with 98% accuracy,
while maintaining as much as 86% accuracy when the phantom is improperly
imaged.
Discovering Salient Anatomical Landmarks by Predicting Human Gaze
Comments: Accepted at IEEE International Symposium on Biomedical Imaging 2020 (ISBI 2020)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Anatomical landmarks are a crucial prerequisite for many medical imaging
tasks. Usually, the set of landmarks for a given task is predefined by experts.
The landmark locations for a given image are then annotated manually or via
machine learning methods trained on manual annotations. In this paper, in
contrast, we present a method to automatically discover and localize anatomical
landmarks in medical images. Specifically, we consider landmarks that attract
the visual attention of humans, which we term visually salient landmarks. We
illustrate the method for fetal neurosonographic images. First, full-length
clinical fetal ultrasound scans are recorded with live sonographer
gaze-tracking. Next, a convolutional neural network (CNN) is trained to predict
the gaze point distribution (saliency map) of the sonographers on scan video
frames. The CNN is then used to predict saliency maps of unseen fetal
neurosonographic images, and the landmarks are extracted as the local maxima of
these saliency maps. Finally, the landmarks are matched across images by
clustering the landmark CNN features. We show that the discovered landmarks can
be used within affine image registration, with average landmark alignment
errors between 4.1% and 10.9% of the fetal head long axis length.
Causality based Feature Fusion for Brain NeuroDevelopmental Analysis
Comments: 10 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI)
Human brain development is a complex and dynamic process that is affected by
several factors such as genetics, sex hormones, and environmental changes. A
number of recent studies on brain development have examined functional
connectivity (FC) defined by the temporal correlation between time series of
different brain regions. We propose to add the directional flow of information
during brain maturation. To do so, we extract effective connectivity (EC)
through Granger causality (GC) for two different groups of subjects, i.e.,
children and young adults. The motivation is that the inclusion of causal
interaction may further discriminate brain connections between two age groups
and help to discover new connections between brain regions. The contributions
of this study are threefold. First, there has been a lack of attention to
EC-based feature extraction in the context of brain development. To this end,
we propose a new kernel-based GC (KGC) method to learn nonlinearity of complex
brain network, where a reduced Sine hyperbolic polynomial (RSP) neural network
was used as our proposed learner. Second, we used causality values as the
weight for the directional connectivity between brain regions. Our findings
indicated that the strength of connections was significantly higher in young
adults relative to children. In addition, our new EC-based feature outperformed
FC-based analysis from Philadelphia neurocohort (PNC) study with better
discrimination of the different age groups. Moreover, the fusion of these two
sets of features (FC + EC) improved brain age prediction accuracy by more than
4%, indicating that they should be used together for brain development studies.
Are Accelerometers for Activity Recognition a Dead-end?
Catherine Tong , Shyam A. Tailor , Nicholas D. Lane Subjects : Computer Vision and Pattern Recognition (cs.CV)
Accelerometer-based (and by extension other inertial sensors) research for
Human Activity Recognition (HAR) is a dead-end. This sensor does not offer
enough information for us to progress in the core domain of HAR—to recognize
everyday activities from sensor data. Despite continued and prolonged efforts
in improving feature engineering and machine learning models, the activities
that we can recognize reliably have only expanded slightly and many of the same
flaws of early models are still present today. Instead of relying on
acceleration data, we should instead consider modalities with much richer
information—a logical choice are images. With the rapid advance in image
sensing hardware and modelling techniques, we believe that a widespread
adoption of image sensors will open many opportunities for accurate and robust
inference across a wide spectrum of human activities.
In this paper, we make the case for imagers in place of accelerometers as the
default sensor for human activity recognition. Our review of past works has led
to the observation that progress in HAR had stalled, caused by our reliance on
accelerometers. We further argue for the suitability of images for activity
recognition by illustrating their richness of information and the marked
progress in computer vision. Through a feasibility analysis, we find that
deploying imagers and CNNs on device poses no substantial burden on modern
mobile hardware. Overall, our work highlights the need to move away from
accelerometers and calls for further exploration of using imagers for activity
recognition.
Learning to Correct 3D Reconstructions from Multiple Views
Ştefan Săftescu , Paul Newman Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Robotics (cs.RO)
This paper is about reducing the cost of building good large-scale 3D
reconstructions post-hoc. We render 2D views of an existing reconstruction and
train a convolutional neural network (CNN) that refines inverse-depth to match
a higher-quality reconstruction. Since the views that we correct are rendered
from the same reconstruction, they share the same geometry, so overlapping
views complement each other. We take advantage of that in two ways. Firstly, we
impose a loss during training which guides predictions on neighbouring views to
have the same geometry and has been shown to improve performance. Secondly, in
contrast to previous work, which corrects each view independently, we also make
predictions on sets of neighbouring views jointly. This is achieved by warping
feature maps between views and thus bypassing memory-intensive 3D computation.
We make the observation that features in the feature maps are
viewpoint-dependent, and propose a method for transforming features with
dynamic filters generated by a multi-layer perceptron from the relative poses
between views. In our experiments we show that this last step is necessary for
successfully fusing feature maps between views.
UniPose: Unified Human Pose Estimation in Single Images and Videos
Bruno Artacho , Andreas Savakis Subjects : Computer Vision and Pattern Recognition (cs.CV)
We propose UniPose, a unified framework for human pose estimation, based on
our “Waterfall” Atrous Spatial Pooling architecture, that achieves
state-of-art-results on several pose estimation metrics. Current pose
estimation methods utilizing standard CNN architectures heavily rely on
statistical postprocessing or predefined anchor poses for joint localization.
UniPose incorporates contextual segmentation and joint localization to estimate
the human pose in a single stage, with high accuracy, without relying on
statistical postprocessing methods. The Waterfall module in UniPose leverages
the efficiency of progressive filtering in the cascade architecture, while
maintaining multi-scale fields-of-view comparable to spatial pyramid
configurations. Additionally, our method is extended to UniPose-LSTM for
multi-frame processing and achieves state-of-the-art results for temporal pose
estimation in Video. Our results on multiple datasets demonstrate that UniPose,
with a ResNet backbone and Waterfall module, is a robust and efficient
architecture for pose estimation obtaining state-of-the-art results in single
person pose detection for both single images and videos.
Depthwise Non-local Module for Fast Salient Object Detection Using a Single Thread
Comments: Accepted as a regular paper in the IEEE Transactions on Cybernetics
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Recently deep convolutional neural networks have achieved significant success
in salient object detection. However, existing state-of-the-art methods require
high-end GPUs to achieve real-time performance, which makes them hard to adapt
to low-cost or portable devices. Although generic network architectures have
been proposed to speed up inference on mobile devices, they are tailored to the
task of image classification or semantic segmentation, and struggle to capture
intra-channel and inter-channel correlations that are essential for contrast
modeling in salient object detection. Motivated by the above observations, we
design a new deep learning algorithm for fast salient object detection. The
proposed algorithm for the first time achieves competitive accuracy and high
inference efficiency simultaneously with a single CPU thread. Specifically, we
propose a novel depthwise non-local moudule (DNL), which implicitly models
contrast via harvesting intra-channel and inter-channel correlations in a
self-attention manner. In addition, we introduce a depthwise non-local network
architecture that incorporates both depthwise non-local modules and inverted
residual blocks. Experimental results show that our proposed network attains
very competitive accuracy on a wide range of salient object detection datasets
while achieving state-of-the-art efficiency among all existing deep learning
based algorithms.
Attention! A Lightweight 2D Hand Pose Estimation Approach
Comments: submitted to IEEE Signal Processing Letters
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Vision based human pose estimation is an non-invasive technology for
Human-Computer Interaction (HCI). Direct use of the hand as an input device
provides an attractive interaction method, with no need for specialized sensing
equipment, such as exoskeletons, gloves etc, but a camera. Traditionally, HCI
is employed in various applications spreading in areas including manufacturing,
surgery, entertainment industry and architecture, to mention a few. Deployment
of vision based human pose estimation algorithms can give a breath of
innovation to these applications. In this letter, we present a novel
Convolutional Neural Network architecture, reinforced with a Self-Attention
module that it can be deployed on an embedded system, due to its lightweight
nature, with just 1.9 Million parameters. The source code and qualitative
results are publicly available.
ResDepth: Learned Residual Stereo Reconstruction
Corinne Stucker , Konrad Schindler Subjects : Computer Vision and Pattern Recognition (cs.CV)
We propose an embarrassingly simple, but very effective scheme for
high-quality dense stereo reconstruction: (i) generate an approximate
reconstruction with your favourite stereo matcher; (ii) rewarp the input images
with that approximate model; and (iii) with the initial reconstruction and the
warped images as input, train a deep network to enhance the reconstruction by
regressing a residual correction. The strategy to only learn the residual
greatly simplifies the learning problem. A standard Unet without bells and
whistles is enough to reconstruct even small surface details, like dormers and
roof substructures in satellite images. We also investigate residual
reconstruction with less information and find that even a single image is
enough to greatly improve an approximate reconstruction. Our full model reduces
the mean absolute error of state-of-the-art stereo reconstruction systems by
>50%, both in our target domain of satellite stereo and on stereo pairs from
the ETH3D benchmark.
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi , Lin Su , Jia Song , Edward Cui , Taroon Bharti , Arun Sachet Subjects : Computer Vision and Pattern Recognition (cs.CV)
In this paper, we introduce a new vision-language pre-trained model —
ImageBERT — for image-text joint embedding. Our model is a Transformer-based
model, which takes different modalities as input and models the relationship
between them. The model is pre-trained on four tasks simultaneously: Masked
Language Modeling (MLM), Masked Object Classification (MOC), Masked Region
Feature Regression (MRFR), and Image Text Matching (ITM). To further enhance
the pre-training quality, we have collected a Large-scale weAk-supervised
Image-Text (LAIT) dataset from Web. We first pre-train the model on this
dataset, then conduct a second stage pre-training on Conceptual Captions and
SBU Captions. Our experiments show that multi-stage pre-training strategy
outperforms single-stage pre-training. We also fine-tune and evaluate our
pre-trained ImageBERT model on image retrieval and text retrieval tasks, and
achieve new state-of-the-art results on both MSCOCO and Flickr30k datasets.
A Fixation-based 360° Benchmark Dataset for Salient Object Detection
Comments: 5 pages, 5 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Fixation prediction (FP) in panoramic contents has been widely investigated
along with the booming trend of virtual reality (VR) applications. However,
another issue within the field of visual saliency, salient object detection
(SOD), has been seldom explored in 360° (or omnidirectional) images due to
the lack of datasets representative of real scenes with pixel-level
annotations. Toward this end, we collect 107 equirectangular panoramas with
challenging scenes and multiple object classes. Based on the consistency
between FP and explicit saliency judgements, we further manually annotate 1,165
salient objects over the collected images with precise masks under the guidance
of real human eye fixation maps. Six state-of-the-art SOD models are then
benchmarked on the proposed fixation-based 360° image dataset (F-360iSOD),
by applying a multiple cubic projection-based fine-tuning method. Experimental
results show a limitation of the current methods when used for SOD in panoramic
images, which indicates the proposed dataset is challenging. Key issues for
360° SOD is also discussed. The proposed dataset is available at
Optimized Generic Feature Learning for Few-shot Classification across Domains
Tonmoy Saikia , Thomas Brox , Cordelia Schmid Subjects : Computer Vision and Pattern Recognition (cs.CV)
To learn models or features that generalize across tasks and domains is one
of the grand goals of machine learning. In this paper, we propose to use
cross-domain, cross-task data as validation objective for hyper-parameter
optimization (HPO) to improve on this goal. Given a rich enough search space,
optimization of hyper-parameters learn features that maximize validation
performance and, due to the objective, generalize across tasks and domains. We
demonstrate the effectiveness of this strategy on few-shot image classification
within and across domains. The learned features outperform all previous
few-shot and meta-learning approaches.
Comments: 15 pages, 14 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Statistical shape models (SSMs) are state-of-the-art medical image analysis
tools for extracting and explaining features across a set of biological
structures. However, a principled and robust way to combine shape and pose
features has been illusive due to three main issues: 1) Non-homogeneity of the
data (data with linear and non-linear natural variation across features), 2)
non-optimal representation of the (3D) motion (rigid transformation
representations that are not proportional to the kinetic energy that move an
object from one position to the other), and 3) artificial discretization of the
models. In this paper, we propose a new framework for dynamic multi-object
statistical modelling framework for the analysis of human joints in a
continuous domain. Specifically, we propose to normalise shape and dynamic
spatial features in the same linearized statistical space permitting the use of
linear statistics; we adopt an optimal 3D motion representation for more
accurate rigid transformation comparisons; and we provide a 3D shape and pose
prediction protocol using a Markov chain Monte Carlo sampling-based fitting.
The framework affords an efficient generative dynamic multi-object modelling
platform for biological joints. We validate the framework using a controlled
synthetic data. Finally, the framework is applied to an analysis of the human
shoulder joint to compare its performance with standard SSM approaches in
prediction of shape while adding the advantage of determining relative pose
between bones in a complex. Excellent validity is observed and the shoulder
joint shape-pose prediction results suggest that the novel framework may have
utility for a range of medical image analysis applications. Furthermore, the
framework is generic and can be extended to n(>)2 objects, making it suitable
for clinical and diagnostic methods for the management of joint disorders.
Partially-Shared Variational Auto-encoders for Unsupervised Domain Adaptation with Target Shift
Ryuhei Takahashi , Masaaki Iiyama , Atsushi Hashimoto , Motoharu Sonogashira Subjects : Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a novel approach for unsupervised domain adaptation (UDA)
with target shift. Target shift is a problem of mismatch in label distribution
between source and target domains. Typically it appears as class-imbalance in
target domain. In practice, this is an important problem in UDA; as we do not
know labels in target domain datasets, we do not know whether or not its
distribution is identical to that in the source domain dataset. Many
traditional approaches achieve UDA with distribution matching by minimizing
mean maximum discrepancy or adversarial training; however these approaches
implicitly assume a coincidence in the distributions and do not work under
situations with target shift. Some recent UDA approaches focus on class
boundary and some of them are robust to target shift, but they are only
applicable to classification and not to regression.
To overcome the target shift problem in UDA, the proposed method, partially
shared variational autoencoders (PS-VAEs), uses pair-wise feature alignment
instead of feature distribution matching. PS-VAEs inter-convert domain of each
sample by a CycleGAN-based architecture while preserving its label-related
content. To evaluate the performance of PS-VAEs, we carried out two
experiments: UDA with class-unbalanced digits datasets (classification), and
UDA from synthesized data to real observation in human-pose-estimation
(regression). The proposed method presented its robustness against the
class-imbalance in the classification task, and outperformed the other methods
in the regression task with a large margin.
Curvature Regularized Surface Reconstruction from Point Cloud
Comments: 22 pages, 15 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We propose a variational functional and fast algorithms to reconstruct
implicit surface from point cloud data with a curvature constraint. The
minimizing functional balances the distance function from the point cloud and
the mean curvature term. Only the point location is used, without any local
normal or curvature estimation at each point. With the added curvature
constraint, the computation becomes particularly challenging. To enhance the
computational efficiency, we solve the problem by a novel operator splitting
scheme. It replaces the original high-order PDEs by a decoupled PDE system,
which is solved by a semi-implicit method. We also discuss approach using an
augmented Lagrangian method. The proposed method shows robustness against
noise, and recovers concave features and sharp corners better compared to
models without curvature constraint. Numerical experiments in two and three
dimensional data sets, noisy and sparse data are presented to validate the
model.
Sara Shahsavarani , Morteza Analoui , Reza Shoja Ghiass Subjects : Computer Vision and Pattern Recognition (cs.CV)
Despite significant advances in Deep Face Recognition (DFR) systems,
introducing new DFRs under specific constraints such as varying pose still
remains a big challenge. Most particularly, due to the 3D nature of a human
head, facial appearance of the same subject introduces a high intra-class
variability when projected to the camera image plane. In this paper, we propose
a new multi-view Deep Face Recognition (MVDFR) system to address the mentioned
challenge. In this context, multiple 2D images of each subject under different
views are fed into the proposed deep neural network with a unique design to
re-express the facial features in a single and more compact face descriptor,
which in turn, produces a more informative and abstract way for face
identification using convolutional neural networks. To extend the functionality
of our proposed system to multi-view facial images, the golden standard Deep-ID
model is modified in our proposed model. The experimental results indicate that
our proposed method yields a 99.8% accuracy, while the state-of-the-art method
achieves a 97% accuracy. We also gathered the Iran University of Science and
Technology (IUST) face database with 6552 images of 504 subjects to accomplish
our experiments.
LRF-Net: Learning Local Reference Frames for 3D Local Shape Description and Matching
Comments: 7 pages, 9 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG)
The local reference frame (LRF) acts as a critical role in 3D local shape
description and matching. However, most of existing LRFs are hand-crafted and
suffer from limited repeatability and robustness. This paper presents the first
attempt to learn an LRF via a Siamese network that needs weak supervision only.
In particular, we argue that each neighboring point in the local surface gives
a unique contribution to LRF construction and measure such contributions via
learned weights. Extensive analysis and comparative experiments on three public
datasets addressing different application scenarios have demonstrated that
LRF-Net is more repeatable and robust than several state-of-the-art LRF methods
(LRF-Net is only trained on one dataset). In addition, LRF-Net can
significantly boost the local shape description and 6-DoF pose estimation
performance when matching 3D point clouds.
Depth-Based Selective Blurring in Stereo Images Using Accelerated Framework
Comments: arXiv admin note: text overlap with arXiv:2001.06967
Journal-ref: 3D Research (Springer) 5, Article number: 14 (2014)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Image and Video Processing (eess.IV)
We propose a hybrid method for stereo disparity estimation by combining block
and region-based stereo matching approaches. It generates dense depth maps from
disparity measurements of only 18 % image pixels (left or right). The
methodology involves segmenting pixel lightness values using fast K-Means
implementation, refining segment boundaries using morphological filtering and
connected components analysis; then determining boundaries’ disparities using
sum of absolute differences (SAD) cost function. Complete disparity maps are
reconstructed from boundaries’ disparities. We consider an application of our
method for depth-based selective blurring of non-interest regions of stereo
images, using Gaussian blur to de-focus users’ non-interest regions.
Experiments on Middlebury dataset demonstrate that our method outperforms
traditional disparity estimation approaches using SAD and normalized cross
correlation by up to 33.6 % and some recent methods by up to 6.1 %. Further,
our method is highly parallelizable using CPU and GPU framework based on Java
Thread Pool and APARAPI with speed-up of 5.8 for 250 stereo video frames (4,096
x 2,304).
Scientific Image Tampering Detection Based On Noise Inconsistencies: A Method And Datasets
Ziyue Xiang , Daniel Acuna Subjects : Computer Vision and Pattern Recognition (cs.CV)
Scientific image tampering is a problem that affects not only authors but
also the general perception of the research community. Although previous
researchers have developed methods to identify tampering in natural images,
these methods may not thrive under the scientific setting as scientific images
have different statistics, format, quality, and intentions. Therefore, we
propose a scientific-image specific tampering detection method based on noise
inconsistencies, which is capable of learning and generalizing to different
fields of science. We train and test our method on a new dataset of manipulated
western blot and microscopy imagery, which aims at emulating problematic images
in science. The test results show that our method can detect various types of
image manipulation in different scenarios robustly, and it outperforms existing
general-purpose image tampering detection schemes. We discuss applications
beyond these two types of images and suggest next steps for making detection of
problematic images a systematic step in peer review and science in general.
Weakly Supervised Temporal Action Localization Using Deep Metric Learning
Comments: accepted to WACV 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG)
Temporal action localization is an important step towards video
understanding. Most current action localization methods depend on untrimmed
videos with full temporal annotations of action instances. However, it is
expensive and time-consuming to annotate both action labels and temporal
boundaries of videos. To this end, we propose a weakly supervised temporal
action localization method that only requires video-level action instances as
supervision during training. We propose a classification module to generate
action labels for each segment in the video, and a deep metric learning module
to learn the similarity between different action instances. We jointly optimize
a balanced binary cross-entropy loss and a metric loss using a standard
backpropagation algorithm. Extensive experiments demonstrate the effectiveness
of both of these components in temporal localization. We evaluate our algorithm
on two challenging untrimmed video datasets: THUMOS14 and ActivityNet1.2. Our
approach improves the current state-of-the-art result for THUMOS14 by 6.5% mAP
at IoU threshold 0.5, and achieves competitive performance for ActivityNet1.2.
Deep Depth Prior for Multi-View Stereo
Pallabi Ghosh , Vibhav Vineet , Larry S. Davis , Abhinav Shrivastava , Sudipta Sinha , Neel Joshi Subjects : Computer Vision and Pattern Recognition (cs.CV)
It was recently shown that the structure of convolutional neural networks
induces a strong prior favoring natural color images, a phenomena referred to
as a deep image prior (DIP), which can be an effective regularizer in inverse
problems such as image denoising, inpainting etc. In this paper, we investigate
a similar idea for depth images, which we call a deep depth prior.
Specifically, given a color image and a noisy and incomplete target depth map
from the same viewpoint, we optimize a randomly initialized CNN model to
reconstruct an RGB-D image where the depth channel gets restored by virtue of
using the network structure as a prior. We propose using deep depth priors for
refining and inpainting noisy depth maps within a multi-view stereo pipeline.
We optimize the network parameters to minimize two losses 1) a RGB-D
reconstruction loss based on the noisy depth map and 2) a multi-view
photoconsistency-based loss, which is computed using images from a
geometrically calibrated camera from nearby viewpoints. Our quantitative and
qualitative evaluation shows that our refined depth maps are more accurate and
complete, and after fusion, produces dense 3D models of higher quality.
Lesion Harvester: Iteratively Mining Unlabeled Lesions and Hard-Negative Examples at Scale
Comments: This work has been submitted to the IEEE for possible publication
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Acquiring large-scale medical image data, necessary for training machine
learning algorithms, is frequently intractable, due to prohibitive
expert-driven annotation costs. Recent datasets extracted from hospital
archives, e.g., DeepLesion, have begun to address this problem. However, these
are often incompletely or noisily labeled, e.g., DeepLesion leaves over 50% of
its lesions unlabeled. Thus, effective methods to harvest missing annotations
are critical for continued progress in medical image analysis. This is the goal
of our work, where we develop a powerful system to harvest missing lesions from
the DeepLesion dataset at high precision. Accepting the need for some degree of
expert labor to achieve high fidelity, we exploit a small fully-labeled subset
of medical image volumes and use it to intelligently mine annotations from the
remainder. To do this, we chain together a highly sensitive lesion proposal
generator and a very selective lesion proposal classifier. While our framework
is generic, we optimize our performance by proposing a 3D contextual lesion
proposal generator and by using a multi-view multi-scale lesion proposal
classifier. These produce harvested and hard-negative proposals, which we then
re-use to finetune our proposal generator by using a novel hard negative
suppression loss, continuing this process until no extra lesions are found.
Extensive experimental analysis demonstrates that our method can harvest an
additional 9,805 lesions while keeping precision above 90%. To demonstrate the
benefits of our approach, we show that lesion detectors trained on our
harvested lesions can significantly outperform the same variants only trained
on the original annotations, with boost of average precision of 7% to 10%. We
open source our code and annotations at
Adaptive Loss Function for Super Resolution Neural Networks Using Convex Optimization Techniques
Seyed Mehdi Ayyoubzadeh , Xiaolin Wu Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Single Image Super-Resolution (SISR) task refers to learn a mapping from
low-resolution images to the corresponding high-resolution ones. This task is
known to be extremely difficult since it is an ill-posed problem. Recently,
Convolutional Neural Networks (CNNs) have achieved state of the art performance
on SISR. However, the images produced by CNNs do not contain fine details of
the images. Generative Adversarial Networks (GANs) aim to solve this issue and
recover sharp details. Nevertheless, GANs are notoriously difficult to train.
Besides that, they generate artifacts in the high-resolution images. In this
paper, we have proposed a method in which CNNs try to align images in different
spaces rather than only the pixel space. Such a space is designed using convex
optimization techniques. CNNs are encouraged to learn high-frequency components
of the images as well as low-frequency components. We have shown that the
proposed method can recover fine details of the images and it is stable in the
training process.
Block-wise Scrambled Image Recognition Using Adaptation Network
Comments: 6 pages Artificial Intelligence of Things(AAAI-2020 WS)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI)
In this study, a perceptually hidden object-recognition method is
investigated to generate secure images recognizable by humans but not machines.
Hence, both the perceptual information hiding and the corresponding object
recognition methods should be developed. Block-wise image scrambling is
introduced to hide perceptual information from a third party. In addition, an
adaptation network is proposed to recognize those scrambled images.
Experimental comparisons conducted using CIFAR datasets demonstrated that the
proposed adaptation network performed well in incorporating simple perceptual
information hiding into DNN-based image classification.
EMOPAIN Challenge 2020: Multimodal Pain Evaluation from Facial and Bodily Expressions
Comments: 8 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The EmoPain 2020 Challenge is the first international competition aimed at
creating a uniform platform for the comparison of machine learning and
multimedia processing methods of automatic chronic pain assessment from human
expressive behaviour, and also the identification of pain-related behaviours.
The objective of the challenge is to promote research in the development of
assistive technologies that help improve the quality of life for people with
chronic pain via real-time monitoring and feedback to help manage their
condition and remain physically active. The challenge also aims to encourage
the use of the relatively underutilised, albeit vital bodily expression signals
for automatic pain and pain-related emotion recognition. This paper presents a
description of the challenge, competition guidelines, bench-marking dataset,
and the baseline systems’ architecture and performance on the three sub-tasks:
pain estimation from facial expressions, pain recognition from multimodal
movement, and protective movement behaviour detection.
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
Comments: arXiv admin note: text overlap with arXiv:1909.05073
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Weight pruning has been widely acknowledged as a straightforward and
effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby
achieving acceleration on various platforms. However, most of the pruning
techniques are essentially trade-offs between model accuracy and regularity
which lead to impaired inference accuracy and limited on-device acceleration
performance. To solve the problem, we introduce a new sparsity dimension,
namely pattern-based sparsity that comprises pattern and connectivity sparsity,
and becoming both highly accurate and hardware friendly. With carefully
designed patterns, the proposed pruning unprecedentedly and consistently
achieves accuracy enhancement and better feature extraction ability on
different DNN structures and datasets, and our pattern-aware pruning framework
also achieves pattern library extraction, pattern selection, pattern and
connectivity pruning and weight training simultaneously. Our approach on the
new pattern-based sparsity naturally fits into compiler optimization for highly
efficient DNN execution on mobile platforms. To the best of our knowledge, it
is the first time that mobile devices achieve real-time inference for the
large-scale DNN models thanks to the unique spatial property of pattern-based
sparsity and the help of the code generation capability of compilers.
Pruning CNN's with linear filter ensembles
Comments: accepted to ECAI2020
Subjects:
Machine Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Despite the promising results of convolutional neural networks (CNNs),
applying them on resource limited devices is still a challenge, mainly due to
the huge memory and computation requirements. To tackle these problems, pruning
can be applied to reduce the network size and number of floating point
operations (FLOPs). Contrary to the emph{filter norm} method — that is used
in network pruning and uses the assumption that the smaller this norm, the less
important is the associated component –, we develop a novel filter importance
norm that incorporates the loss caused by the elimination of a component from
the CNN.
To estimate the importance of a set of architectural components, we measure
the CNN performance as different components are removed. The result is a
collection of filter ensembles — filter masks — and associated performance
values. We rank the filters based on a linear and additive model and remove the
least important ones such that the drop in network accuracy is minimal. We
evaluate our method on a fully connected network, as well as on the ResNet
architecture trained on the CIFAR-10 data-set. Using our pruning method, we
managed to remove (60\%) of the parameters and (64\%) of the FLOPs from the
ResNet with an accuracy drop of less than (0.6\%).
Comments: 11 pages, 5 figures
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Natural images can be regarded as residing in a manifold that is embedded in
a higher dimensional Euclidean space. Generative Adversarial Networks (GANs)
try to learn the distribution of the real images in the manifold to generate
samples that look real. But the results of existing methods still exhibit many
unpleasant artifacts and distortions even for the cases where the desired
ground truth target images are available for supervised learning such as in
single image super resolution (SISR). We probe for ways to alleviate these
problems for supervised GANs in this paper. We explicitly apply the Lipschitz
Continuity Condition (LCC) to regularize the GAN. An encoding network that maps
the image space to a new optimal latent space is derived from the LCC, and it
is used to augment the GAN as a coupling component. The LCC is also converted
to new regularization terms in the generator loss function to enforce local
invariance. The GAN is optimized together with the encoding network in an
attempt to make the generator converge to a more ideal and disentangled mapping
that can generate samples more faithful to the target images. When the proposed
models are applied to the single image super resolution problem, the results
outperform the state of the art.
DeepFL-IQA: Weak Supervision for Deep IQA Feature Learning
Comments: dataset url: this http URL
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV)
Multi-level deep-features have been driving state-of-the-art methods for
aesthetics and image quality assessment (IQA). However, most IQA benchmarks are
comprised of artificially distorted images, for which features derived from
ImageNet under-perform. We propose a new IQA dataset and a weakly supervised
feature learning approach to train features more suitable for IQA of
artificially distorted images. The dataset, KADIS-700k, is far more extensive
than similar works, consisting of 140,000 pristine images, 25 distortions
types, totaling 700k distorted versions. Our weakly supervised feature learning
is designed as a multi-task learning type training, using eleven existing
full-reference IQA metrics as proxies for differential mean opinion scores. We
also introduce a benchmark database, KADID-10k, of artificially degraded
images, each subjectively annotated by 30 crowd workers. We make use of our
derived image feature vectors for (no-reference) image quality assessment by
training and testing a shallow regression network on this database and five
other benchmark IQA databases. Our method, termed DeepFL-IQA, performs better
than other feature-based no-reference IQA methods and also better than all
tested full-reference IQA methods on KADID-10k. For the other five benchmark
IQA databases, DeepFL-IQA matches the performance of the best existing
end-to-end deep learning-based methods on average.
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
Comments: AAAI 2020 (10 pages)
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We present a new multimodal question answering challenge, ManyModalQA, in
which an agent must answer a question by considering three distinct modalities:
text, images, and tables. We collect our data by scraping Wikipedia and then
utilize crowdsourcing to collect question-answer pairs. Our questions are
ambiguous, in that the modality that contains the answer is not easily
determined based solely upon the question. To demonstrate this ambiguity, we
construct a modality selector (or disambiguator) network, and this model gets
substantially lower accuracy on our challenge set, compared to existing
datasets, indicating that our questions are more ambiguous. By analyzing this
model, we investigate which words in the question are indicative of the
modality. Next, we construct a simple baseline ManyModalQA model, which, based
on the prediction from the modality selector, fires a corresponding pre-trained
state-of-the-art unimodal QA model. We focus on providing the community with a
new manymodal evaluation set and only provide a fine-tuning set, with the
expectation that existing datasets and approaches will be transferred for most
of the training, to encourage low-resource generalization without large,
monolithic training sets for each new task. There is a significant gap between
our baseline models and human performance; therefore, we hope that this
challenge encourages research in end-to-end modality disambiguation and
multimodal QA models, as well as transfer learning. Code and data available at:
this https URLOliver Willers , Sebastian Sudholt , Shervin Raafatnia , Stephanie Abrecht Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Deep learning methods are widely regarded as indispensable when it comes to
designing perception pipelines for autonomous agents such as robots, drones or
automated vehicles. The main reasons, however, for deep learning not being used
for autonomous agents at large scale already are safety concerns. Deep learning
approaches typically exhibit a black-box behavior which makes it hard for them
to be evaluated with respect to safety-critical aspects. While there have been
some work on safety in deep learning, most papers typically focus on high-level
safety concerns. In this work, we seek to dive into the safety concerns of deep
learning methods and present a concise enumeration on a deeply technical level.
Additionally, we present extensive discussions on possible mitigation methods
and give an outlook regarding what mitigation methods are still missing in
order to facilitate an argumentation for the safety of a deep learning method.
Anomaly detection in chest radiographs with a weakly supervised flow-based deep learning method
H. Shibata (1), S. Hanaoka (2), Y. Nomura (1), T. Nakao (3), I. Sato (2 and 4 and 5), N. Hayashi (1), O. Abe (2 and 3) ((1) Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, (2) Department of Radiology, The University of Tokyo Hospital, (3) Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, (4) Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, (5) Center for Advanced Intelligence Project, RIKEN) Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Preventing the oversight of anomalies in chest X-ray radiographs (CXRs)
during diagnosis is a crucial issue. Deep learning (DL)-based anomaly detection
methods are rapidly growing in popularity, and provide effective solutions to
the problem, but the workload in labeling CXRs during the training procedure
remains heavy. To reduce the workload, a novel anomaly detection method for
CXRs based on weakly supervised DL is presented in this study. The DL is based
on a flow-based deep neural network (DNN) framework with which two normality
metrics (logarithm likelihood and logarithm likelihood ratio) can be
calculated. With this method, only one set of normal CXRs requires labeling to
train the DNN, then the normality of any unknown CXR can be evaluated. The area
under the receiver operation characteristic curve acquired with the logarithm
likelihood ratio metric ((approx0.783)) was greater than that obtained with
the logarithm likelihood metric, and was a value comparable to those in
previous studies where other weakly supervised DNNs were implemented.
GhostImage: Perception Domain Attacks against Vision-based Object Classification Systems
Yanmao Man , Ming Li , Ryan Gerdes Subjects : Cryptography and Security (cs.CR) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
In vision-based object classification systems, imaging sensors perceive the
environment and then objects are detected and classified for decision-making
purposes. Vulnerabilities in the perception domain enable an attacker to inject
false data into the sensor which could lead to unsafe consequences. In this
work, we focus on camera-based systems and propose GhostImage attacks, with the
goal of either creating a fake perceived object or obfuscating the object’s
image that leads to wrong classification results. This is achieved by remotely
projecting adversarial patterns into camera-perceived images, exploiting two
common effects in optical imaging systems, namely lens flare/ghost effects, and
auto-exposure control. To improve the robustness of the attack to channel
perturbations, we generate optimal input patterns by integrating adversarial
machine learning techniques with a trained end-to-end channel model. We realize
GhostImage attacks with a projector, and conducted comprehensive experiments,
using three different image datasets, in indoor and outdoor environments, and
three different cameras. We demonstrate that GhostImage attacks are applicable
to both autonomous driving and security surveillance scenarios. Experiment
results show that, depending on the projector-camera distance, attack success
rates can reach as high as 100%.
TEASER: Fast and Certifiable Point Cloud Registration
Comments: 20 pages main text, 22 pages appendix
Subjects:
Robotics (cs.RO)
; Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
We propose the first fast and certifiable algorithm for the registration of
two sets of 3D points in the presence of large amounts of outlier
correspondences. Towards this goal, we first reformulate the registration
problem using a Truncated Least Squares (TLS) cost that makes the estimation
insensitive to spurious correspondences. Then, we provide a general
graph-theoretic framework to decouple scale, rotation, and translation
estimation, which allows solving in cascade for the three transformations.
Despite the fact that each subproblem is still non-convex and combinatorial in
nature, we show that (i) TLS scale and (component-wise) translation estimation
can be solved in polynomial time via an adaptive voting scheme, (ii) TLS
rotation estimation can be relaxed to a semidefinite program (SDP) and the
relaxation is tight, even in the presence of extreme outlier rates. We name the
resulting algorithm TEASER (Truncated least squares Estimation And SEmidefinite
Relaxation). While solving large SDP relaxations is typically slow, we develop
a second certifiable algorithm, named TEASER++, that circumvents the need to
solve an SDP and runs in milliseconds. For both algorithms, we provide
theoretical bounds on the estimation errors, which are the first of their kind
for robust registration problems. Moreover, we test their performance on
standard benchmarks, object detection datasets, and the 3DMatch scan matching
dataset, and show that (i) both algorithms dominate the state of the art (e.g.,
RANSAC, branch-&-bound, heuristics) and are robust to more than 99% outliers,
(ii) TEASER++ can run in milliseconds and it is currently the fastest robust
registration algorithm, (iii) TEASER++ is so robust it can also solve problems
without correspondences (e.g., hypothesizing all-to-all correspondences) where
it largely outperforms ICP. We release a fast open-source C++ implementation of
TEASER++.
Artificial Intelligence
StarAI: Reducing incompleteness in the game of Bridge using PLP
J Li , S Thepaut , V Ventos Subjects : Artificial Intelligence (cs.AI)
Bridge is a trick-taking card game requiring the ability to evaluate
probabilities since it is a game of incomplete information where each player
only sees its cards. In order to choose a strategy, a player needs to gather
information about the hidden cards in the other players’ hand. We present a
methodology allowing us to model a part of card playing in Bridge using
Probabilistic Logic Programming.
DeepEnroll: Patient-Trial Matching with Deep Embeddingand Entailment Prediction
Comments: accepted by The World Wide Web Conference 2020
Subjects:
Artificial Intelligence (cs.AI)
Clinical trials are essential for drug development but often suffer from
expensive, inaccurate and insufficient patient recruitment. The core problem of
patient-trial matching is to find qualified patients for a trial, where patient
information is stored in electronic health records (EHR) while trial
eligibility criteria (EC) are described in text documents available on the web.
How to represent longitudinal patient EHR? How to extract complex logical rules
from EC? Most existing works rely on manual rule-based extraction, which is
time consuming and inflexible for complex inference. To address these
challenges, we proposed DeepEnroll, a cross-modal inference learning model to
jointly encode enrollment criteria (text) and patients records (tabular data)
into a shared latent space for matching inference. DeepEnroll applies a
pre-trained Bidirectional Encoder Representations from Transformers(BERT) model
to encode clinical trial information into sentence embedding. And uses a
hierarchical embedding model to represent patient longitudinal EHR. In
addition, DeepEnroll is augmented by a numerical information embedding and
entailment module to reason over numerical information in both EC and EHR.
These encoders are trained jointly to optimize patient-trial matching score. We
evaluated DeepEnroll on the trial-patient matching task with demonstrated on
real world datasets. DeepEnroll outperformed the best baseline by up to 12.4%
in average F1.
Accelerating supply chains with Ant Colony Optimization across range of hardware solutions
Ivars Dzalbs , Tatiana Kalganova Subjects : Artificial Intelligence (cs.AI) ; Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)
Ant Colony algorithm has been applied to various optimization problems,
however most of the previous work on scaling and parallelism focuses on
Travelling Salesman Problems (TSPs). Although, useful for benchmarks and new
idea comparison, the algorithmic dynamics does not always transfer to complex
real-life problems, where additional meta-data is required during solution
construction. This paper looks at real-life outbound supply chain problem using
Ant Colony Optimization (ACO) and its scaling dynamics with two parallel ACO
architectures – Independent Ant Colonies (IAC) and Parallel Ants (PA). Results
showed that PA was able to reach a higher solution quality in fewer iterations
as the number of parallel instances increased. Furthermore, speed performance
was measured across three different hardware solutions – 16 core CPU, 68 core
Xeon Phi and up to 4 Geforce GPUs. State of the art, ACO vectorization
techniques such as SS-Roulette were implemented using C++ and CUDA. Although
excellent for TSP, it was concluded that for the given supply chain problem
GPUs are not suitable due to meta-data access footprint required. Furthermore,
compared to their sequential counterpart, vectorized CPU AVX2 implementation
achieved 25.4x speedup on CPU while Xeon Phi with its AVX512 instruction set
reached 148x on PA with Vectorized (PAwV). PAwV is therefore able to scale at
least up to 1024 parallel instances on the supply chain network problem solved.
Algorithms for Tensor Network Contraction Ordering
Comments: 10 pages, 10 figures
Subjects:
Artificial Intelligence (cs.AI)
; Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Quantum Physics (quant-ph)
Contracting tensor networks is often computationally demanding. Well-designed
contraction sequences can dramatically reduce the contraction cost. We explore
the performance of simulated annealing and genetic algorithms, two common
discrete optimization techniques, to this ordering problem. We benchmark their
performance as well as that of the commonly-used greedy search on physically
relevant tensor networks. Where computationally feasible, we also compare them
with the optimal contraction sequence obtained by an exhaustive search. We find
that the algorithms we consider consistently outperform a greedy search given
equal computational resources, with an advantage that scales with tensor
network size. We compare the obtained contraction sequences and identify signs
of highly non-local optimization, with the more sophisticated algorithms
sacrificing run-time early in the contraction for better overall performance.
A Neural Architecture for Person Ontology population
Comments: 6 pages, 10 figures. arXiv admin note: substantial text overlap with arXiv:1811.09368
Subjects:
Artificial Intelligence (cs.AI)
A person ontology comprising concepts, attributes and relationships of people
has a number of applications in data protection, didentification, population of
knowledge graphs for business intelligence and fraud prevention. While
artificial neural networks have led to improvements in Entity Recognition,
Entity Classification, and Relation Extraction, creating an ontology largely
remains a manual process, because it requires a fixed set of semantic relations
between concepts. In this work, we present a system for automatically
populating a person ontology graph from unstructured data using neural models
for Entity Classification and Relation Extraction. We introduce a new dataset
for these tasks and discuss our results.
Benchmarking Symbolic Execution Using Constraint Problems — Initial Results
Journal-ref: ICTAI 2019
Subjects:
Artificial Intelligence (cs.AI)
; Software Engineering (cs.SE)
Symbolic execution is a powerful technique for bug finding and program
testing. It is successful in finding bugs in real-world code. The core
reasoning techniques use constraint solving, path exploration, and search,
which are also the same techniques used in solving combinatorial problems,
e.g., finite-domain constraint satisfaction problems (CSPs). We propose CSP
instances as more challenging benchmarks to evaluate the effectiveness of the
core techniques in symbolic execution. We transform CSP benchmarks into C
programs suitable for testing the reasoning capabilities of symbolic execution
tools. From a single CSP P, we transform P depending on transformation choice
into different C programs. Preliminary testing with the KLEE, Tracer-X, and
LLBMC tools show substantial runtime differences from transformation and solver
choice. Our C benchmarks are effective in showing the limitations of existing
symbolic execution tools. The motivation for this work is we believe that
benchmarks of this form can spur the development and engineering of improved
core reasoning in symbolic execution engines.
An Approach for Time-aware Domain-based Social Influence Prediction
Bilal Abu-Salih , Kit Yan Chan , Omar Al-Kadi , Marwan Al-Tawil , Pornpit Wongthongtham , Tomayess Issa , Heba Saadeh , Malak Al-Hassan , Bushra Bremie , Abdulaziz Albahlal Subjects : Artificial Intelligence (cs.AI)
Online Social Networks(OSNs) have established virtual platforms enabling
people to express their opinions, interests and thoughts in a variety of
contexts and domains, allowing legitimate users as well as spammers and other
untrustworthy users to publish and spread their content. Hence, the concept of
social trust has attracted the attention of information processors/data
scientists and information consumers/business firms. One of the main reasons
for acquiring the value of Social Big Data (SBD) is to provide frameworks and
methodologies using which the credibility of OSNs users can be evaluated. These
approaches should be scalable to accommodate large-scale social data. Hence,
there is a need for well comprehending of social trust to improve and expand
the analysis process and inferring the credibility of SBD. Given the exposed
environment’s settings and fewer limitations related to OSNs, the medium allows
legitimate and genuine users as well as spammers and other low trustworthy
users to publish and spread their content. Hence, this paper presents an
approach incorporates semantic analysis and machine learning modules to measure
and predict users’ trustworthiness in numerous domains in different time
periods. The evaluation of the conducted experiment validates the applicability
of the incorporated machine learning techniques to predict highly trustworthy
domain-based users.
A Journey into Ontology Approximation: From Non-Horn to Hon
Comments: 20 pages, 4 figures, submitted to ijcai2020
Subjects:
Artificial Intelligence (cs.AI)
We study complete approximations of an ontology formulated in a non-Horn
description logic (DL) such as (mathcal{ALC}) in a Horn DL such
as~(mathcal{EL}). We provide concrete approximation schemes that are
necessarily infinite and observe that in the (mathcal{ELU})-to-(mathcal{EL})
case finite approximations tend to exist in practice and are guaranteed to
exist when the original ontology is acyclic. In contrast, neither of this is
the case for (mathcal{ELU}_ot)-to-(mathcal{EL}_ot) and for
(mathcal{ALC})-to-(mathcal{EL}_ot) approximations. We also define a notion
of approximation tailored towards ontology-mediated querying, connect it to
subsumption-based approximations, and identify a case where finite
approximations are guaranteed to exist.
Emergence of Pragmatics from Referential Game between Theory of Mind Agents
Luyao Yuan , Zipeng Fu , Jingyue Shen , Lu Xu , Junhong Shen , Song-Chun Zhu Subjects : Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Pragmatics studies how context can contribute to language meanings [1]. In
human communication, language is never interpreted out of context, and
sentences can usually convey more information than their literal meanings [2].
However, this mechanism is missing in most multi-agent systems [3, 4, 5, 6],
restricting the communication efficiency and the capability of human-agent
interaction. In this paper, we propose an algorithm, using which agents can
spontaneously learn the ability to “read between lines” without any explicit
hand-designed rules. We integrate the theory of mind (ToM) [7, 8] in a
cooperative multi-agent pedagogical situation and propose an adaptive
reinforcement learning (RL) algorithm to develop a communication protocol. ToM
is a profound cognitive science concept, claiming that people regularly reason
about other’s mental states, including beliefs, goals, and intentions, to
obtain performance advantage in competition, cooperation or coalition. With
this ability, agents consider language as not only messages but also rational
acts reflecting others’ hidden states. Our experiments demonstrate the
advantage of pragmatic protocols over non-pragmatic protocols. We also show the
teaching complexity following the pragmatic protocol empirically approximates
to recursive teaching dimension (RTD).
Adaptive Large Neighborhood Search for Circle Bin Packing Problem
Comments: 13 pages, 6 figures, 6 tables
Subjects:
Artificial Intelligence (cs.AI)
; Distributed, Parallel, and Cluster Computing (cs.DC)
We address a new variant of packing problem called the circle bin packing
problem (CBPP), which is to find a dense packing of circle items to multiple
square bins so as to minimize the number of used bins. To this end, we propose
an adaptive large neighborhood search (ALNS) algorithm, which uses our Greedy
Algorithm with Corner Occupying Action (GACOA) to construct an initial layout.
The greedy solution is usually in a local optimum trap, and ALNS enables
multiple neighborhood search that depends on the stochastic annealing schedule
to avoid getting stuck in local minimum traps. Specifically, ALNS perturbs the
current layout to jump out of a local optimum by iteratively reassigns some
circles and accepts the new layout with some probability during the search. The
acceptance probability is adjusted adaptively using simulated annealing that
fine-tunes the search direction in order to reach the global optimum. We
benchmark computational results against GACOA in heterogeneous instances. ALNS
always outperforms GACOA in improving the objective function, and in several
cases, there is a significant reduction on the number of bins used in the
packing.
Michael C. Nwogugu Subjects : Theoretical Economics (econ.TH) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Dynamical Systems (math.DS)
The Sharing Economy (which includes Airbnb, Apple, Alibaba, Uber, WeWork,
Ebay, Didi Chuxing, Amazon) blossomed across the world, triggered structural
changes in industries and significantly affected international capital flows
primarily by disobeying a wide variety of statutes and laws in many countries.
They also illegally reduced and changing the nature of competition in many
industries often to the detriment of social welfare. This article develops new
dynamic pricing models for the SEOs and derives some stability properties of
mixed games and dynamic algorithms which eliminate antitrust liability and also
reduce deadweight losses, greed, Regret and GPS manipulation. The new dynamic
pricing models contravene the Myerson Satterthwaite Impossibility Theorem.
Automatic phantom test pattern classification through transfer learning with deep neural networks
Rafael B. Fricks , Justin Solomon , Ehsan Samei Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Medical Physics (physics.med-ph)
Imaging phantoms are test patterns used to measure image quality in computer
tomography (CT) systems. A new phantom platform (Mercury Phantom, Gammex)
provides test patterns for estimating the task transfer function (TTF) or noise
power spectrum (NPF) and simulates different patient sizes. Determining which
image slices are suitable for analysis currently requires manual annotation of
these patterns by an expert, as subtle defects may make an image unsuitable for
measurement. We propose a method of automatically classifying these test
patterns in a series of phantom images using deep learning techniques. By
adapting a convolutional neural network based on the VGG19 architecture with
weights trained on ImageNet, we use transfer learning to produce a classifier
for this domain. The classifier is trained and evaluated with over 3,500
phantom images acquired at a university medical center. Input channels for
color images are successfully adapted to convey contextual information for
phantom images. A series of ablation studies are employed to verify design
aspects of the classifier and evaluate its performance under varying training
conditions. Our solution makes extensive use of image augmentation to produce a
classifier that accurately classifies typical phantom images with 98% accuracy,
while maintaining as much as 86% accuracy when the phantom is improperly
imaged.
A utility-based analysis of equilibria in multi-objective normal form games
Comments: Under review since 16 January 2020
Subjects:
Computer Science and Game Theory (cs.GT)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
In multi-objective multi-agent systems (MOMAS), agents explicitly consider
the possible tradeoffs between conflicting objective functions. We argue that
compromises between competing objectives in MOMAS should be analysed on the
basis of the utility that these compromises have for the users of a system,
where an agent’s utility function maps their payoff vectors to scalar utility
values. This utility-based approach naturally leads to two different
optimisation criteria for agents in a MOMAS: expected scalarised returns (ESR)
and scalarised expected returns (SER). In this article, we explore the
differences between these two criteria using the framework of multi-objective
normal form games (MONFGs). We demonstrate that the choice of optimisation
criterion (ESR or SER) can radically alter the set of equilibria in a MONFG
when non-linear utility functions are used.
Causality based Feature Fusion for Brain NeuroDevelopmental Analysis
Comments: 10 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI)
Human brain development is a complex and dynamic process that is affected by
several factors such as genetics, sex hormones, and environmental changes. A
number of recent studies on brain development have examined functional
connectivity (FC) defined by the temporal correlation between time series of
different brain regions. We propose to add the directional flow of information
during brain maturation. To do so, we extract effective connectivity (EC)
through Granger causality (GC) for two different groups of subjects, i.e.,
children and young adults. The motivation is that the inclusion of causal
interaction may further discriminate brain connections between two age groups
and help to discover new connections between brain regions. The contributions
of this study are threefold. First, there has been a lack of attention to
EC-based feature extraction in the context of brain development. To this end,
we propose a new kernel-based GC (KGC) method to learn nonlinearity of complex
brain network, where a reduced Sine hyperbolic polynomial (RSP) neural network
was used as our proposed learner. Second, we used causality values as the
weight for the directional connectivity between brain regions. Our findings
indicated that the strength of connections was significantly higher in young
adults relative to children. In addition, our new EC-based feature outperformed
FC-based analysis from Philadelphia neurocohort (PNC) study with better
discrimination of the different age groups. Moreover, the fusion of these two
sets of features (FC + EC) improved brain age prediction accuracy by more than
4%, indicating that they should be used together for brain development studies.
Q-Learning in enormous action spaces via amortized approximate maximization
Comments: A previous version of this work appeared at the Deep Reinforcement Learning Workshop, NeurIPS 2018
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Applying Q-learning to high-dimensional or continuous action spaces can be
difficult due to the required maximization over the set of possible actions.
Motivated by techniques from amortized inference, we replace the expensive
maximization over all actions with a maximization over a small subset of
possible actions sampled from a learned proposal distribution. The resulting
approach, which we dub Amortized Q-learning (AQL), is able to handle discrete,
continuous, or hybrid action spaces while maintaining the benefits of
Q-learning. Our experiments on continuous control tasks with up to 21
dimensional actions show that AQL outperforms D3PG (Barth-Maron et al, 2018)
and QT-Opt (Kalashnikov et al, 2018). Experiments on structured discrete action
spaces demonstrate that AQL can efficiently learn good policies in spaces with
thousands of discrete actions.
Secure and Robust Machine Learning for Healthcare: A Survey
Adnan Qayyum , Junaid Qadir , Muhammad Bilal , Ala Al-Fuqaha Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Recent years have witnessed widespread adoption of machine learning (ML)/deep
learning (DL) techniques due to their superior performance for a variety of
healthcare applications ranging from the prediction of cardiac arrest from
one-dimensional heart signals to computer-aided diagnosis (CADx) using
multi-dimensional medical images. Notwithstanding the impressive performance of
ML/DL, there are still lingering doubts regarding the robustness of ML/DL in
healthcare settings (which is traditionally considered quite challenging due to
the myriad security and privacy issues involved), especially in light of recent
results that have shown that ML/DL are vulnerable to adversarial attacks. In
this paper, we present an overview of various application areas in healthcare
that leverage such techniques from security and privacy point of view and
present associated challenges. In addition, we present potential methods to
ensure secure and privacy-preserving ML for healthcare applications. Finally,
we provide insight into the current research challenges and promising
directions for future research.
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
Comments: AAAI 2020 (10 pages)
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We present a new multimodal question answering challenge, ManyModalQA, in
which an agent must answer a question by considering three distinct modalities:
text, images, and tables. We collect our data by scraping Wikipedia and then
utilize crowdsourcing to collect question-answer pairs. Our questions are
ambiguous, in that the modality that contains the answer is not easily
determined based solely upon the question. To demonstrate this ambiguity, we
construct a modality selector (or disambiguator) network, and this model gets
substantially lower accuracy on our challenge set, compared to existing
datasets, indicating that our questions are more ambiguous. By analyzing this
model, we investigate which words in the question are indicative of the
modality. Next, we construct a simple baseline ManyModalQA model, which, based
on the prediction from the modality selector, fires a corresponding pre-trained
state-of-the-art unimodal QA model. We focus on providing the community with a
new manymodal evaluation set and only provide a fine-tuning set, with the
expectation that existing datasets and approaches will be transferred for most
of the training, to encourage low-resource generalization without large,
monolithic training sets for each new task. There is a significant gap between
our baseline models and human performance; therefore, we hope that this
challenge encourages research in end-to-end modality disambiguation and
multimodal QA models, as well as transfer learning. Code and data available at:
this https URLSubjective Knowledge and Reasoning about Agents in Multi-Agent Systems
Shikha Singh , Deepak Khemani Subjects : Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Though a lot of work in multi-agent systems is focused on reasoning about
knowledge and beliefs of artificial agents, an explicit representation and
reasoning about the presence/absence of agents, especially in the scenarios
where agents may be unaware of other agents joining in or going offline in a
multi-agent system, leading to partial knowledge/asymmetric knowledge of the
agents is mostly overlooked by the MAS community. Such scenarios lay the
foundations of cases where an agent can influence other agents’ mental states
by (mis)informing them about the presence/absence of collaborators or
adversaries. In this paper, we investigate how Kripke structure-based epistemic
models can be extended to express the above notion based on an agent’s
subjective knowledge and we discuss the challenges that come along.
ARAACOM: ARAbic Algerian Corpus for Opinion Mining
Journal-ref: ICCES ’17: Proceedings of the International Conference on
Computing for Engineering and Sciences, Jul 2017, Istanbul, France. pp.35-39
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Nowadays, it is no more needed to do an enormous effort to distribute a lot
of forms to thousands of people and collect them, then convert this from into
electronic format to track people opinion about some subjects. A lot of web
sites can today reach a large spectrum with less effort. The majority of web
sites suggest to their visitors to leave backups about their feeling of the
site or events. So, this makes for us a lot of data which need powerful mean to
exploit. Opinion mining in the web becomes more and more an attracting task,
due the increasing need for individuals and societies to track the mood of
people against several subjects of daily life (sports, politics,
television,…). A lot of works in opinion mining was developed in western
languages especially English, such works in Arabic language still very scarce.
In this paper, we propose our approach, for opinion mining in Arabic Algerian
news paper. CCS CONCEPTS (ullet)Information systems~Sentiment analysis
(ullet) Computing methodologies~Natural language processing
On Solving Cooperative MARL Problems with a Few Good Experiences
Rajiv Ranjan Kumar , Pradeep Varakantham Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cooperative Multi-agent Reinforcement Learning (MARL) is crucial for
cooperative decentralized decision learning in many domains such as search and
rescue, drone surveillance, package delivery and fire fighting problems. In
these domains, a key challenge is learning with a few good experiences, i.e.,
positive reinforcements are obtained only in a few situations (e.g., on
extinguishing a fire or tracking a crime or delivering a package) and in most
other situations there is zero or negative reinforcement. Learning decisions
with a few good experiences is extremely challenging in cooperative MARL
problems due to three reasons. First, compared to the single agent case,
exploration is harder as multiple agents have to be coordinated to receive a
good experience. Second, environment is not stationary as all the agents are
learning at the same time (and hence change policies). Third, scale of problem
increases significantly with every additional agent.
Relevant existing work is extensive and has focussed on dealing with a few
good experiences in single-agent RL problems or on scalable approaches for
handling non-stationarity in MARL problems. Unfortunately, neither of these
approaches (or their extensions) are able to address the problem of sparse good
experiences effectively. Therefore, we provide a novel fictitious self
imitation approach that is able to simultaneously handle non-stationarity and
sparse good experiences in a scalable manner. Finally, we provide a thorough
comparison (experimental or descriptive) against relevant cooperative MARL
algorithms to demonstrate the utility of our approach.
Comments: 7 pages, 6 figures
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Existing graph neural networks may suffer from the “suspended animation
problem” when the model architecture goes deep. Meanwhile, for some graph
learning scenarios, e.g., nodes with text/image attributes or graphs with
long-distance node correlations, deep graph neural networks will be necessary
for effective graph representation learning. In this paper, we propose a new
graph neural network, namely DIFNET (Graph Diffusive Neural Network), for graph
representation learning and node classification. DIFNET utilizes both neural
gates and graph residual learning for node hidden state modeling, and includes
an attention mechanism for node neighborhood information diffusion. Extensive
experiments will be done in this paper to compare DIFNET against several
state-of-the-art graph neural network models. The experimental results can
illustrate both the learning performance advantages and effectiveness of
DIFNET, especially in addressing the “suspended animation problem”.
Convergence Time Optimization for Federated Learning over Wireless Networks
Mingzhe Chen , H. Vincent Poor , Walid Saad , Shuguang Cui Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI); Machine Learning (stat.ML)
In this paper, the convergence time of federated learning (FL), when deployed
over a realistic wireless network, is studied. In particular, a wireless
network is considered in which wireless users transmit their local FL models
(trained using their locally collected data) to a base station (BS). The BS,
acting as a central controller, generates a global FL model using the received
local FL models and broadcasts it back to all users. Due to the limited number
of resource blocks (RBs) in a wireless network, only a subset of users can be
selected to transmit their local FL model parameters to the BS at each learning
step. Moreover, since each user has unique training data samples, the BS
prefers to include all local user FL models to generate a converged global FL
model. Hence, the FL performance and convergence time will be significantly
affected by the user selection scheme. Therefore, it is necessary to design an
appropriate user selection scheme that enables users of higher importance to be
selected more frequently. This joint learning, wireless resource allocation,
and user selection problem is formulated as an optimization problem whose goal
is to minimize the FL convergence time while optimizing the FL performance. To
solve this problem, a probabilistic user selection scheme is proposed such that
the BS is connected to the users whose local FL models have significant effects
on its global FL model with high probabilities. Given the user selection
policy, the uplink RB allocation can be determined. To further reduce the FL
convergence time, artificial neural networks (ANNs) are used to estimate the
local FL models of the users that are not allocated any RBs for local FL model
transmission at each given learning step, which enables the BS to enhance its
global FL model and improve the FL convergence speed and performance.
Coarse-Grain Cluster Analysis of Tensors With Application to Climate Biome Identification
Derek DeSantis , Phillip J. Wolfram , Katrina Bennett , Boian Alexandrov Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
A tensor provides a concise way to codify the interdependence of complex
data. Treating a tensor as a d-way array, each entry records the interaction
between the different indices. Clustering provides a way to parse the
complexity of the data into more readily understandable information. Clustering
methods are heavily dependent on the algorithm of choice, as well as the chosen
hyperparameters of the algorithm. However, their sensitivity to data scales is
largely unknown.
In this work, we apply the discrete wavelet transform to analyze the effects
of coarse-graining on clustering tensor data. We are particularly interested in
understanding how scale effects clustering of the Earth’s climate system. The
discrete wavelet transform allows classification of the Earth’s climate across
a multitude of spatial-temporal scales. The discrete wavelet transform is used
to produce an ensemble of classification estimates, as opposed to a single
classification. Using information theory, we discover a sub-collection of the
ensemble that span the majority of the variance observed, allowing for
efficient consensus clustering techniques that can be used to identify climate
biomes.
Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP
Ying Xu , Xu Zhong , Antonio Jose Jimeno Yepes , Jey Han Lau Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
An adversarial example is an input transformed by small perturbations that
machine learning models consistently misclassify. While there are a number of
methods proposed to generate adversarial examples for text data, it is not
trivial to assess the quality of these adversarial examples, as minor
perturbations (such as changing a word in a sentence) can lead to a significant
shift in their meaning, readability and classification label. In this paper, we
propose an evaluation framework to assess the quality of adversarial examples
based on the aforementioned properties. We experiment with five benchmark
attacking methods and an alternative approach based on an auto-encoder, and
found that these methods generate adversarial examples with poor readability
and content preservation. We also learned that there are multiple factors that
can influence the attacking performance, such as the the length of text
examples and the input domain.
When does the Tukey median work?
Banghua Zhu , Jiantao Jiao , Jacob Steinhardt Subjects : Statistics Theory (math.ST) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
We analyze the performance of the Tukey median estimator under total
variation (TV) distance corruptions. Previous results show that under Huber’s
additive corruption model, the breakdown point is 1/3 for high-dimensional
halfspace-symmetric distributions. We show that under TV corruptions, the
breakdown point reduces to 1/4 for the same set of distributions. We also show
that a certain projection algorithm can attain the optimal breakdown point of
1/2. Both the Tukey median estimator and the projection algorithm achieve
sample complexity linear in dimension.
Learning Directed Locomotion in Modular Robots with Evolvable Morphologies
Comments: 30 pages, 14 figures
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI)
We generalize the well-studied problem of gait learning in modular robots in
two dimensions. Firstly, we address locomotion in a given target direction that
goes beyond learning a typical undirected gait. Secondly, rather than studying
one fixed robot morphology we consider a test suite of different modular
robots. This study is based on our interest in evolutionary robot systems where
both morphologies and controllers evolve. In such a system, newborn robots have
to learn to control their own body that is a random combination of the bodies
of the parents. We apply and compare two learning algorithms, Bayesian
optimization and HyperNEAT. The results of the experiments in simulation show
that both methods successfully learn good controllers, but Bayesian
optimization is more effective and efficient. We validate the best learned
controllers by constructing three robots from the test suite in the real world
and observe their fitness and actual trajectories. The obtained results
indicate a reality gap that depends on the controllers and the shape of the
robots, but overall the trajectories are adequate and follow the target
directions successfully.
Adaptive Loss Function for Super Resolution Neural Networks Using Convex Optimization Techniques
Seyed Mehdi Ayyoubzadeh , Xiaolin Wu Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Single Image Super-Resolution (SISR) task refers to learn a mapping from
low-resolution images to the corresponding high-resolution ones. This task is
known to be extremely difficult since it is an ill-posed problem. Recently,
Convolutional Neural Networks (CNNs) have achieved state of the art performance
on SISR. However, the images produced by CNNs do not contain fine details of
the images. Generative Adversarial Networks (GANs) aim to solve this issue and
recover sharp details. Nevertheless, GANs are notoriously difficult to train.
Besides that, they generate artifacts in the high-resolution images. In this
paper, we have proposed a method in which CNNs try to align images in different
spaces rather than only the pixel space. Such a space is designed using convex
optimization techniques. CNNs are encouraged to learn high-frequency components
of the images as well as low-frequency components. We have shown that the
proposed method can recover fine details of the images and it is stable in the
training process.
Block-wise Scrambled Image Recognition Using Adaptation Network
Comments: 6 pages Artificial Intelligence of Things(AAAI-2020 WS)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI)
In this study, a perceptually hidden object-recognition method is
investigated to generate secure images recognizable by humans but not machines.
Hence, both the perceptual information hiding and the corresponding object
recognition methods should be developed. Block-wise image scrambling is
introduced to hide perceptual information from a third party. In addition, an
adaptation network is proposed to recognize those scrambled images.
Experimental comparisons conducted using CIFAR datasets demonstrated that the
proposed adaptation network performed well in incorporating simple perceptual
information hiding into DNN-based image classification.
EMOPAIN Challenge 2020: Multimodal Pain Evaluation from Facial and Bodily Expressions
Comments: 8 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The EmoPain 2020 Challenge is the first international competition aimed at
creating a uniform platform for the comparison of machine learning and
multimedia processing methods of automatic chronic pain assessment from human
expressive behaviour, and also the identification of pain-related behaviours.
The objective of the challenge is to promote research in the development of
assistive technologies that help improve the quality of life for people with
chronic pain via real-time monitoring and feedback to help manage their
condition and remain physically active. The challenge also aims to encourage
the use of the relatively underutilised, albeit vital bodily expression signals
for automatic pain and pain-related emotion recognition. This paper presents a
description of the challenge, competition guidelines, bench-marking dataset,
and the baseline systems’ architecture and performance on the three sub-tasks:
pain estimation from facial expressions, pain recognition from multimodal
movement, and protective movement behaviour detection.
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
Comments: arXiv admin note: text overlap with arXiv:1909.05073
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Weight pruning has been widely acknowledged as a straightforward and
effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby
achieving acceleration on various platforms. However, most of the pruning
techniques are essentially trade-offs between model accuracy and regularity
which lead to impaired inference accuracy and limited on-device acceleration
performance. To solve the problem, we introduce a new sparsity dimension,
namely pattern-based sparsity that comprises pattern and connectivity sparsity,
and becoming both highly accurate and hardware friendly. With carefully
designed patterns, the proposed pruning unprecedentedly and consistently
achieves accuracy enhancement and better feature extraction ability on
different DNN structures and datasets, and our pattern-aware pruning framework
also achieves pattern library extraction, pattern selection, pattern and
connectivity pruning and weight training simultaneously. Our approach on the
new pattern-based sparsity naturally fits into compiler optimization for highly
efficient DNN execution on mobile platforms. To the best of our knowledge, it
is the first time that mobile devices achieve real-time inference for the
large-scale DNN models thanks to the unique spatial property of pattern-based
sparsity and the help of the code generation capability of compilers.
Information Retrieval
Comments: arXiv admin note: substantial text overlap with arXiv:1209.0126
Subjects:
Information Retrieval (cs.IR)
In this paper, we present the experimental work done on Query Expansion (QE)
for retrieval tasks of Gujarati text documents. In information retrieval, it is
very difficult to estimate the exact user need, query expansion adds terms to
the original query, which provides more information about the user need. There
are various approaches to query expansion. In our work, manual thesaurus based
query expansion was performed to evaluate the performance of widely used
information retrieval models for Gujarati text documents. Results show that
query expansion improves the recall of text documents.
Emotion and Sentiment Lexicon Impact on Sentiment Analysis Applied to Book Reviews
Comments: in French
Journal-ref: COnf{‘e}rence en Recherche d’Informations et Applications – CORIA
2019, 16th French Information Retrieval Conference, Mar 2019, Lyon, France
Subjects:
Information Retrieval (cs.IR)
; Social and Information Networks (cs.SI)
Consumers are used to consulting posted reviews on the Internet before buying
a product. But it’s difficult to know the global opinion considering the
important number of those reviews. Sentiment analysis afford detecting polarity
(positive, negative, neutral) in a expressed opinion and therefore classifying
those reviews. Our purpose is to determine the influence of emotions on the
polarity of books reviews. We define “bag-of-words” representation models of
reviews which use a lexicon containing emotional (anticipation, sadness, fear,
anger, joy, surprise, trust, disgust) and sentimental (positive, negative)
words. This lexicon afford measuring felt emotions types by readers. The
implemented supervised learning used is a Random Forest type. The application
concerns Amazon platform’s reviews. Mots-cl{é}s : Analyse de sentiments,
Analyse d'{é}motions (texte), Classification de polarit{é} de sentiments
ARAACOM: ARAbic Algerian Corpus for Opinion Mining
Journal-ref: ICCES ’17: Proceedings of the International Conference on
Computing for Engineering and Sciences, Jul 2017, Istanbul, France. pp.35-39
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Nowadays, it is no more needed to do an enormous effort to distribute a lot
of forms to thousands of people and collect them, then convert this from into
electronic format to track people opinion about some subjects. A lot of web
sites can today reach a large spectrum with less effort. The majority of web
sites suggest to their visitors to leave backups about their feeling of the
site or events. So, this makes for us a lot of data which need powerful mean to
exploit. Opinion mining in the web becomes more and more an attracting task,
due the increasing need for individuals and societies to track the mood of
people against several subjects of daily life (sports, politics,
television,…). A lot of works in opinion mining was developed in western
languages especially English, such works in Arabic language still very scarce.
In this paper, we propose our approach, for opinion mining in Arabic Algerian
news paper. CCS CONCEPTS (ullet)Information systems~Sentiment analysis
(ullet) Computing methodologies~Natural language processing
Graph Generators: State of the Art and Open Challenges
Comments: ACM Computing Surveys, 32 pages
Subjects:
Databases (cs.DB)
; Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
The abundance of interconnected data has fueled the design and implementation
of graph generators reproducing real-world linking properties, or gauging the
effectiveness of graph algorithms, techniques and applications manipulating
these data. We consider graph generation across multiple subfields, such as
Semantic Web, graph databases, social networks, and community detection, along
with general graphs. Despite the disparate requirements of modern graph
generators throughout these communities, we analyze them under a common
umbrella, reaching out the functionalities, the practical usage, and their
supported operations. We argue that this classification is serving the need of
providing scientists, researchers and practitioners with the right data
generator at hand for their work. This survey provides a comprehensive overview
of the state-of-the-art graph generators by focusing on those that are
pertinent and suitable for several data-intensive tasks. Finally, we discuss
open challenges and missing requirements of current graph generators along with
their future extensions to new emerging fields.
VoiceCoach: Interactive Evidence-based Training for Voice Modulation Skills in Public Speaking
Comments: Accepted by CHI ’20
Subjects:
Human-Computer Interaction (cs.HC)
; Computation and Language (cs.CL); Information Retrieval (cs.IR)
The modulation of voice properties, such as pitch, volume, and speed, is
crucial for delivering a successful public speech. However, it is challenging
to master different voice modulation skills. Though many guidelines are
available, they are often not practical enough to be applied in different
public speaking situations, especially for novice speakers. We present
VoiceCoach, an interactive evidence-based approach to facilitate the effective
training of voice modulation skills. Specifically, we have analyzed the voice
modulation skills from 2623 high-quality speeches (i.e., TED Talks) and use
them as the benchmark dataset. Given a voice input, VoiceCoach automatically
recommends good voice modulation examples from the dataset based on the
similarity of both sentence structures and voice modulation skills. Immediate
and quantitative visual feedback is provided to guide further improvement. The
expert interviews and the user study provide support for the effectiveness and
usability of VoiceCoach.
Keyword-based Topic Modeling and Keyword Selection
Xingyu Wang , Lida Zhang , Diego Klabjan Subjects : Machine Learning (stat.ML) ; Information Retrieval (cs.IR); Machine Learning (cs.LG)
Certain type of documents such as tweets are collected by specifying a set of
keywords. As topics of interest change with time it is beneficial to adjust
keywords dynamically. The challenge is that these need to be specified ahead of
knowing the forthcoming documents and the underlying topics. The future topics
should mimic past topics of interest yet there should be some novelty in them.
We develop a keyword-based topic model that dynamically selects a subset of
keywords to be used to collect future documents. The generative process first
selects keywords and then the underlying documents based on the specified
keywords. The model is trained by using a variational lower bound and
stochastic gradient optimization. The inference consists of finding a subset of
keywords where given a subset the model predicts the underlying topic-word
matrix for the unknown forthcoming documents. We compare the keyword topic
model against a benchmark model using viral predictions of tweets combined with
a topic model. The keyword-based topic model outperforms this sophisticated
baseline model by 67%.
Optimal estimation of sparse topic models
Xin Bing , Florentina Bunea , Marten Wegkamp Subjects : Machine Learning (stat.ML) ; Information Retrieval (cs.IR); Machine Learning (cs.LG)
Topic models have become popular tools for dimension reduction and
exploratory analysis of text data which consists in observed frequencies of a
vocabulary of (p) words in (n) documents, stored in a (p imes n) matrix. The
main premise is that the mean of this data matrix can be factorized into a
product of two non-negative matrices: a (p imes K) word-topic matrix (A) and a
(K imes n) topic-document matrix (W). This paper studies the estimation of (A)
that is possibly element-wise sparse, and the number of topics (K) is unknown.
In this under-explored context, we derive a new minimax lower bound for the
estimation of such (A) and propose a new computationally efficient algorithm
for its recovery. We derive a finite sample upper bound for our estimator, and
show that it matches the minimax lower bound in many scenarios. Our estimate
adapts to the unknown sparsity of (A) and our analysis is valid for any finite
(n), (p), (K) and document lengths. Empirical results on both synthetic data
and semi-synthetic data show that our proposed estimator is a strong competitor
of the existing state-of-the-art algorithms for both non-sparse (A) and sparse
(A), and has superior performance is many scenarios of interest.
Incentivising Exploration and Recommendations for Contextual Bandits with Payments
Comments: 11 pages, 4 figures
Subjects:
Machine Learning (cs.LG)
; Information Retrieval (cs.IR); Machine Learning (stat.ML)
We propose a contextual bandit based model to capture the learning and social
welfare goals of a web platform in the presence of myopic users. By using
payments to incentivize these agents to explore different
items/recommendations, we show how the platform can learn the inherent
attributes of items and achieve a sublinear regret while maximizing cumulative
social welfare. We also calculate theoretical bounds on the cumulative costs of
incentivization to the platform. Unlike previous works in this domain, we
consider contexts to be completely adversarial, and the behavior of the
adversary is unknown to the platform. Our approach can improve various
engagement metrics of users on e-commerce stores, recommendation engines and
matching platforms.
A Price-Per-Attention Auction Scheme Using Mouse Cursor Information
Journal-ref: ACM Trans. Inf. Syst. 38, 2 (2020)
Subjects:
Computer Science and Game Theory (cs.GT)
; Information Retrieval (cs.IR)
Payments in online ad auctions are typically derived from click-through
rates, so that advertisers do not pay for ineffective ads. But advertisers
often care about more than just clicks. That is, for example, if they aim to
raise brand awareness or visibility. There is thus an opportunity to devise a
more effective ad pricing paradigm, in which ads are paid only if they are
actually noticed. This article contributes a novel auction format based on a
pay-per-attention (PPA) scheme. We show that the PPA auction inherits the
desirable properties (strategy-proofness and efficiency) as its
pay-per-impression and pay-per-click counterparts, and that it also compares
favourably in terms of revenues. To make the PPA format feasible, we also
contribute a scalable diagnostic technology to predict user attention to ads in
sponsored search using raw mouse cursor coordinates only, regardless of the
page content and structure. We use the user attention predictions in numerical
simulations to evaluate the PPA auction scheme. Our results show that, in
relevant economic settings, the PPA revenues would be strictly higher than the
existing auction payment schemes.
Computation and Language
Multilingual Denoising Pre-training for Neural Machine Translation
Comments: Work in progress
Subjects:
Computation and Language (cs.CL)
This paper demonstrates that multilingual denoising pre-training produces
significant performance gains across a wide variety of machine translation (MT)
tasks. We present mBART — a sequence-to-sequence denoising auto-encoder
pre-trained on large-scale monolingual corpora in many languages using the BART
objective. mBART is the first method for pre-training a complete
sequence-to-sequence model by denoising full texts in multiple languages;
previous MT pre-training has focused only on the encoder, decoder, or
reconstructing parts of the text. Pre-training a complete model allows it to be
directly fine tuned for supervised (both sentence-level and document-level) and
unsupervised machine translation, with no task-specific modifications. We
demonstrate that adding mBART initialization produces performance gains in all
but the highest-resource settings, including up to 12 BLEU points for low
resource MT and over 5 BLEU points for many document-level and unsupervised
models. We also show it also enables new types of transfer to language pairs
with no bi-text or that were not in the pre-training corpus, and present
extensive analysis of which factors contribute the most to effective
pre-training.
Unsupervised Domain Adaptation for Neural Machine Translation with Iterative Back Translation
Comments: Submitted to IJCAI 2020
Subjects:
Computation and Language (cs.CL)
; Machine Learning (cs.LG)
State-of-the-art neural machine translation (NMT) systems are data-hungry and
perform poorly on domains with little supervised data. As data collection is
expensive and infeasible in many cases, unsupervised domain adaptation methods
are needed. We apply an Iterative Back Translation (IBT) training scheme on
in-domain monolingual data, which repeatedly uses a Transformer-based NMT model
to create in-domain pseudo-parallel sentence pairs in one translation direction
on the fly and then use them to train the model in the other direction.
Evaluated on three domains of German-to-English translation task with no
supervised data, this simple technique alone (without any out-of-domain
parallel data) can already surpass all previous domain adaptation methods—up
to +9.48 BLEU over the strongest previous method, and up to +27.77 BLEU over
the unadapted baseline. Moreover, given available supervised out-of-domain data
on German-to-English and Romanian-to-English language pairs, we can further
enhance the performance and obtain up to +19.31 BLEU improvement over the
strongest baseline, and +47.69 BLEU increment against the unadapted model.
Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization
Journal-ref: ECIR 2020
Subjects:
Computation and Language (cs.CL)
; Machine Learning (cs.LG)
Contextualized embeddings use unsupervised language model pretraining to
compute word representations depending on their context. This is intuitively
useful for generalization, especially in Named-Entity Recognition where it is
crucial to detect mentions never seen during training. However, standard
English benchmarks overestimate the importance of lexical over contextual
features because of an unrealistic lexical overlap between train and test
mentions. In this paper, we perform an empirical analysis of the generalization
capabilities of state-of-the-art contextualized embeddings by separating
mentions by novelty and with out-of-domain evaluation. We show that they are
particularly beneficial for unseen mentions detection, especially
out-of-domain. For models trained on CoNLL03, language model contextualization
leads to a +1.2% maximal relative micro-F1 score increase in-domain against
+13% out-of-domain on the WNUT dataset
TLT-school: a Corpus of Non Native Children Speech
Roberto Gretter , Marco Matassoni , Stefano Bannò , Daniele Falavigna Subjects : Computation and Language (cs.CL)
This paper describes “TLT-school” a corpus of speech utterances collected in
schools of northern Italy for assessing the performance of students learning
both English and German. The corpus was recorded in the years 2017 and 2018
from students aged between nine and sixteen years, attending primary, middle
and high school. All utterances have been scored, in terms of some predefined
proficiency indicators, by human experts. In addition, most of utterances
recorded in 2017 have been manually transcribed carefully. Guidelines and
procedures used for manual transcriptions of utterances will be described in
detail, as well as results achieved by means of an automatic speech recognition
system developed by us. Part of the corpus is going to be freely distributed to
scientific community particularly interested both in non-native speech
recognition and automatic assessment of second language proficiency.
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
Comments: AAAI 2020 (10 pages)
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We present a new multimodal question answering challenge, ManyModalQA, in
which an agent must answer a question by considering three distinct modalities:
text, images, and tables. We collect our data by scraping Wikipedia and then
utilize crowdsourcing to collect question-answer pairs. Our questions are
ambiguous, in that the modality that contains the answer is not easily
determined based solely upon the question. To demonstrate this ambiguity, we
construct a modality selector (or disambiguator) network, and this model gets
substantially lower accuracy on our challenge set, compared to existing
datasets, indicating that our questions are more ambiguous. By analyzing this
model, we investigate which words in the question are indicative of the
modality. Next, we construct a simple baseline ManyModalQA model, which, based
on the prediction from the modality selector, fires a corresponding pre-trained
state-of-the-art unimodal QA model. We focus on providing the community with a
new manymodal evaluation set and only provide a fine-tuning set, with the
expectation that existing datasets and approaches will be transferred for most
of the training, to encourage low-resource generalization without large,
monolithic training sets for each new task. There is a significant gap between
our baseline models and human performance; therefore, we hope that this
challenge encourages research in end-to-end modality disambiguation and
multimodal QA models, as well as transfer learning. Code and data available at:
this https URLARAACOM: ARAbic Algerian Corpus for Opinion Mining
Journal-ref: ICCES ’17: Proceedings of the International Conference on
Computing for Engineering and Sciences, Jul 2017, Istanbul, France. pp.35-39
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Nowadays, it is no more needed to do an enormous effort to distribute a lot
of forms to thousands of people and collect them, then convert this from into
electronic format to track people opinion about some subjects. A lot of web
sites can today reach a large spectrum with less effort. The majority of web
sites suggest to their visitors to leave backups about their feeling of the
site or events. So, this makes for us a lot of data which need powerful mean to
exploit. Opinion mining in the web becomes more and more an attracting task,
due the increasing need for individuals and societies to track the mood of
people against several subjects of daily life (sports, politics,
television,…). A lot of works in opinion mining was developed in western
languages especially English, such works in Arabic language still very scarce.
In this paper, we propose our approach, for opinion mining in Arabic Algerian
news paper. CCS CONCEPTS (ullet)Information systems~Sentiment analysis
(ullet) Computing methodologies~Natural language processing
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu , Yujia Zhai , Zizhong Chen Subjects : Computation and Language (cs.CL) ; Machine Learning (cs.LG)
Neural Network based models have been state-of-the-art models for various
Natural Language Processing tasks, however, the input and output dimension
problem in the networks has still not been fully resolved, especially in text
generation tasks (e.g. Machine Translation, Text Summarization), in which input
and output both have huge sizes of vocabularies. Therefore, input-output
embedding weight sharing has been introduced and adopted widely, which remains
to be improved. Based on linear algebra and statistical theories, this paper
locates the shortcoming of existed input-output embedding weight sharing
method, then raises methods for improving input-output weight shared embedding,
among which methods of normalization of embedding weight matrices show best
performance. These methods are nearly computational cost-free, can get combined
with other embedding techniques, and show good effectiveness when applied on
state-of-the-art Neural Network models. For Transformer-big models, the
normalization techniques can get at best 0.6 BLEU improvement compared to the
original version of model on WMT’16 En-De dataset, and similar BLEU
improvements on IWSLT 14′ datasets. For DynamicConv models, 0.5 BLEU
improvement can be attained on WMT’16 En-De dataset, and 0.41 BLEU improvement
on IWSLT 14′ De-En translation task is achieved.
Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP
Ying Xu , Xu Zhong , Antonio Jose Jimeno Yepes , Jey Han Lau Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
An adversarial example is an input transformed by small perturbations that
machine learning models consistently misclassify. While there are a number of
methods proposed to generate adversarial examples for text data, it is not
trivial to assess the quality of these adversarial examples, as minor
perturbations (such as changing a word in a sentence) can lead to a significant
shift in their meaning, readability and classification label. In this paper, we
propose an evaluation framework to assess the quality of adversarial examples
based on the aforementioned properties. We experiment with five benchmark
attacking methods and an alternative approach based on an auto-encoder, and
found that these methods generate adversarial examples with poor readability
and content preservation. We also learned that there are multiple factors that
can influence the attacking performance, such as the the length of text
examples and the input domain.
Shared Task: Lexical Semantic Change Detection in German
Adnan Ahmad , Kiflom Desta , Fabian Lang , Dominik Schlechtweg Subjects : Computation and Language (cs.CL)
Recent NLP architectures have illustrated in various ways how semantic change
can be captured across time and domains. However, in terms of evaluation there
is a lack of benchmarks to compare the performance of these systems against
each other. We present the results of the first shared task on unsupervised
lexical semantic change detection (LSCD) in German based on the evaluation
framework proposed by Schlechtweg et al. (2019).
Comments: SCiL 2020
Journal-ref: Proceedings of the Society for Computation in Linguistics 3.1
(2020): 43-52
Subjects:
Computation and Language (cs.CL)
We perform statistical analysis of the phenomenon of neology, the process by
which new words emerge in a language, using large diachronic corpora of
English. We investigate the importance of two factors, semantic sparsity and
frequency growth rates of semantic neighbors, formalized in the distributional
semantics paradigm. We show that both factors are predictive of word emergence
although we find more support for the latter hypothesis. Besides presenting a
new linguistic application of distributional semantics, this study tackles the
linguistic question of the role of language-internal factors (in our case,
sparsity) in language change motivated by language-external factors (reflected
in frequency growth).
VoiceCoach: Interactive Evidence-based Training for Voice Modulation Skills in Public Speaking
Comments: Accepted by CHI ’20
Subjects:
Human-Computer Interaction (cs.HC)
; Computation and Language (cs.CL); Information Retrieval (cs.IR)
The modulation of voice properties, such as pitch, volume, and speed, is
crucial for delivering a successful public speech. However, it is challenging
to master different voice modulation skills. Though many guidelines are
available, they are often not practical enough to be applied in different
public speaking situations, especially for novice speakers. We present
VoiceCoach, an interactive evidence-based approach to facilitate the effective
training of voice modulation skills. Specifically, we have analyzed the voice
modulation skills from 2623 high-quality speeches (i.e., TED Talks) and use
them as the benchmark dataset. Given a voice input, VoiceCoach automatically
recommends good voice modulation examples from the dataset based on the
similarity of both sentence structures and voice modulation skills. Immediate
and quantitative visual feedback is provided to guide further improvement. The
expert interviews and the user study provide support for the effectiveness and
usability of VoiceCoach.
Comments: Accepted to IEEE Transactions on Emerging Topics in Computational Intelligence
Subjects:
Audio and Speech Processing (eess.AS)
; Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
An effective approach for voice conversion (VC) is to disentangle linguistic
content from other components in the speech signal. The effectiveness of
variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies
on this principle. In our prior work, we proposed a cross-domain VAE-VC
(CDVAE-VC) framework, which utilized acoustic features of different properties,
to improve the performance of VAE-VC. We believed that the success came from
more disentangled latent representations. In this paper, we extend the CDVAE-VC
framework by incorporating the concept of adversarial learning, in order to
further increase the degree of disentanglement, thereby improving the quality
and similarity of converted speech. More specifically, we first investigate the
effectiveness of incorporating the generative adversarial networks (GANs) with
CDVAE-VC. Then, we consider the concept of domain adversarial training and add
an explicit constraint to the latent representation, realized by a speaker
classifier, to explicitly eliminate the speaker information that resides in the
latent code. Experimental results confirm that the degree of disentanglement of
the learned latent representation can be enhanced by both GANs and the speaker
classifier. Meanwhile, subjective evaluation results in terms of quality and
similarity scores demonstrate the effectiveness of our proposed methods.
Emergence of Pragmatics from Referential Game between Theory of Mind Agents
Luyao Yuan , Zipeng Fu , Jingyue Shen , Lu Xu , Junhong Shen , Song-Chun Zhu Subjects : Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Pragmatics studies how context can contribute to language meanings [1]. In
human communication, language is never interpreted out of context, and
sentences can usually convey more information than their literal meanings [2].
However, this mechanism is missing in most multi-agent systems [3, 4, 5, 6],
restricting the communication efficiency and the capability of human-agent
interaction. In this paper, we propose an algorithm, using which agents can
spontaneously learn the ability to “read between lines” without any explicit
hand-designed rules. We integrate the theory of mind (ToM) [7, 8] in a
cooperative multi-agent pedagogical situation and propose an adaptive
reinforcement learning (RL) algorithm to develop a communication protocol. ToM
is a profound cognitive science concept, claiming that people regularly reason
about other’s mental states, including beliefs, goals, and intentions, to
obtain performance advantage in competition, cooperation or coalition. With
this ability, agents consider language as not only messages but also rational
acts reflecting others’ hidden states. Our experiments demonstrate the
advantage of pragmatic protocols over non-pragmatic protocols. We also show the
teaching complexity following the pragmatic protocol empirically approximates
to recursive teaching dimension (RTD).
Distributed, Parallel, and Cluster Computing
Tuneful: An Online Significance-Aware Configuration Tuner for Big Data Analytics
Ayat Fekry , Lucian Carata , Thomas Pasquier , Andrew Rice , Andy Hopper Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Systems and Control (eess.SY)
Distributed analytics engines such as Spark are a common choice for
processing extremely large datasets. However, finding good configurations for
these systems remains challenging, with each workload potentially requiring a
different setup to run optimally. Using suboptimal configurations incurs
significant extra runtime costs. %Furthermore, Spark and similar platforms are
gaining traction within data-scientists communities where awareness of such
issues is relatively low.
We propose Tuneful, an approach that efficiently tunes the configuration of
in-memory cluster computing systems. Tuneful combines incremental Sensitivity
Analysis and Bayesian optimization to identify near-optimal configurations from
a high-dimensional search space, using a small number of executions. This setup
allows the tuning to be done online, without any previous training. Our
experimental results show that Tuneful reduces the search time for finding
close-to-optimal configurations by 62\% (at the median) when compared to
existing state-of-the-art techniques. This means that the amortization of the
tuning cost happens significantly faster, enabling practical tuning for new
classes of workloads.
A Simple and Efficient Binary Byzantine Consensus Algorithm using Cryptography and Partial Synchrony
Tyler Crain Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
This paper describes a simple and efficient Binary Byzantine faulty tolerant
consensus algorithm using a weak round coordinator and the partial synchrony
assumption to ensure liveness. In the algorithm, non-faulty nodes perform an
initial broadcast followed by a executing a series of rounds consisting of a
single message broadcast until termination. Each message is accompanied by a
cryptographic proof of its validity. In odd rounds the binary value 1 can be
decided, in even round 0. Up to one third of the nodes can be faulty and
termination is ensured within a number of round of a constant factor of the
number of faults. Experiments show termination can be reached in less than 200
milliseconds with 300 Amazon EC2 instances spread across 5 continents even with
partial initial disagreement.
Fine-grained Analysis on Fast Implementations of Multi-writer Atomic Registers
Comments: v0.1, only contains the impossibility proof
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
This draft in its current version proves an impossibility result concerning
fast implementations of multi-writer distributed atomic registers. This is the
first step of our work toward completing the exploration of fast
implementations of distributed atomic registers. The plan of our work is
outlined in Section 1. The missing sections will be provided soon.
Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided
Comments: 12 pages, 8 figures; Best Student Paper finalist (8/92) and winner of the SC’13 Best Paper Award (1/92); source code of foMPI can be downloaded from this http URL
Journal-ref: Proceedings of the International Conference on High Performance
Computing, Networking, Storage and Analysis, pages 53:1–53:12, November 2013
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Performance (cs.PF)
Modern interconnects offer remote direct memory access (RDMA) features. Yet,
most applications rely on explicit message passing for communications albeit
their unwanted overheads. The MPI-3.0 standard defines a programming interface
for exploiting RDMA networks directly, however, it’s scalability and
practicability has to be demonstrated in practice. In this work, we develop
scalable bufferless protocols that implement the MPI-3.0 specification. Our
protocols support scaling to millions of cores with negligible memory
consumption while providing highest performance and minimal overheads. To arm
programmers, we provide a spectrum of performance models for all critical
functions and demonstrate the usability of our library and models with several
application studies with up to half a million processes. We show that our
design is comparable to, or better than UPC and Fortran Coarrays in terms of
latency, bandwidth, and message rate. We also demonstrate application
performance improvements with comparable programming complexity.
Properties of the Tangle for Uniform Random and Random Walk Tip Selection
Comments: Published in: 2019 IEEE International Conference on Blockchain (Blockchain)
Journal-ref: 2019 IEEE International Conference on Blockchain (Blockchain)
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
The growing number of applications for distributed ledger technologies is
driving both industry and academia to solve the limitations of blockchain,
particularly its scalability issues. Recent distributed ledger technologies
have replaced the blockchain linear structure with a more flexible directed
acyclic graph in an attempt to accommodate a higher throughput. Despite the
fast-growing diffusion of directed acyclic graph based distributed ledger
technologies, researchers lack a basic understanding of their behavior. In this
paper we analyze the Tangle, a directed acyclic graph that is used (with
certain modifications) in various protocols such as IOTA, Byteball, Avalanche
or SPECTRE. Our contribution is threefold. First, we run simulations in a
continuous-time model to examine tip count stability and cumulative weight
evolution while varying the rate of incoming transactions. In particular we
confirm analytical predictions on the number of tips with uniform random tip
selection strategy. Second, we show how different tip selection algorithms
affect the growth of the Tangle. Moreover, we explain these differences by
analyzing the spread of exit probabilities of random walks. Our findings
confirm analytically derived predictions and provide novel insights on the
different phases of growth of cumulative weight as well as on the average time
difference for a transaction to receive its first approval when using distinct
tip selection algorithms. Lastly, we analyze simulation overhead and
performance as a function of Tangle size and compare results for different tip
selection algorithms.
Towards Digital Twins for the Description of Automotive Software Systems
Comments: In Proceedings QAPL 2019, arXiv:2001.06163
Journal-ref: EPTCS 312, 2020, pp. 20-28
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Software Engineering (cs.SE)
We present models for automotive software that capture quantitative and
qualitative aspects of software systems and the underlying hardware
architecture. In particular, we consider different levels of computing power.
These range from controllers up to the cloud. We present a modeling approach
for software deployment taking different automotive requirements such as
criticality, latency, memory, computational resources, and communication into
account. Our models capture automotive software and hardware system
configurations and can serve as digital twins that are digital counterparts of
(usually) physical entities. Furthermore, we highlight connected research areas
and challenges.
Asynchronous Consensus Algorithm
Maxim Zakharov Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
This document describes a new consensus algorithm which is asynchronous and
uses gossip based message dissemination between nodes. The current version of
the algorithm does not cover the case of a node failure or significantly
delayed response. This is the subject of further research of the algorithm. An
outline of a new design for trust-less payment system is given in appendices.
Anchoring the value of Cryptocurrency
Journal-ref: 3rd International Workshop on Blockchain Oriented Software
Engineering. Western University. London, Canada, February 18, 2020
Subjects:
Cryptography and Security (cs.CR)
; Distributed, Parallel, and Cluster Computing (cs.DC)
A decade long thrive of cryptocurrency has shown its potential as a source of
alternative-finance and the security and the robustness of the underpinning
blockchain technology.
However, most cryptocurrencies fail to show inimitability and their meanings
in the real world. As a result, they usually start off as favourites but
quickly become the outcasts of the digital asset market.
The blockchain society attempts to anchor the value of cryptocurrency with
real values by employing smart contracts and link it with computation resources
and the digital-productivity that have value and demands in the real world. But
their attempts have some undesirable effects due to a limited number of
practical applications. This limitation is caused by the dilemma between high
performance and decentralisation (universal joinability). The emerging of
blockchain sharding models, however, has offered a possible solution to address
this dilemma.
In this paper, we explore a financial model for blockchain sharding that will
build an active link between the value of cryptocurrency and computation
resources as well as the market and labour behaviours. Our model can adjust the
price of resources and the compensation for maintaining a system based on those
behaviours. We anchor the value of cryptocurrency by the amount of computation
resources participated in and give the cryptocurrency a meaning as the exchange
between computation resources globally. Finally, we present a working example
which, through financial regularities, regulates the behaviour of anonymous
participants, also incents/discourages participation dynamically.
Simple and Fast Distributed Computation of Betweenness Centrality
Pierluigi Crescenzi , Pierre Fraigniaud , Ami Paz Subjects : Social and Information Networks (cs.SI) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Betweenness centrality is a graph parameter that has been successfully
applied to network analysis. In the context of computer networks, it was
considered for various objectives, ranging from routing to service placement.
However, as observed by Maccari et al. [INFOCOM 2018], research on betweenness
centrality for improving protocols was hampered by the lack of a usable, fully
distributed algorithm for computing this parameter. We resolve this issue by
designing an efficient algorithm for computing betweenness centrality, which
can be implemented by minimal modifications to any distance-vector routing
protocol based on Bellman-Ford. The convergence time of our implementation is
shown to be proportional to the diameter of the network
Accelerating supply chains with Ant Colony Optimization across range of hardware solutions
Ivars Dzalbs , Tatiana Kalganova Subjects : Artificial Intelligence (cs.AI) ; Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)
Ant Colony algorithm has been applied to various optimization problems,
however most of the previous work on scaling and parallelism focuses on
Travelling Salesman Problems (TSPs). Although, useful for benchmarks and new
idea comparison, the algorithmic dynamics does not always transfer to complex
real-life problems, where additional meta-data is required during solution
construction. This paper looks at real-life outbound supply chain problem using
Ant Colony Optimization (ACO) and its scaling dynamics with two parallel ACO
architectures – Independent Ant Colonies (IAC) and Parallel Ants (PA). Results
showed that PA was able to reach a higher solution quality in fewer iterations
as the number of parallel instances increased. Furthermore, speed performance
was measured across three different hardware solutions – 16 core CPU, 68 core
Xeon Phi and up to 4 Geforce GPUs. State of the art, ACO vectorization
techniques such as SS-Roulette were implemented using C++ and CUDA. Although
excellent for TSP, it was concluded that for the given supply chain problem
GPUs are not suitable due to meta-data access footprint required. Furthermore,
compared to their sequential counterpart, vectorized CPU AVX2 implementation
achieved 25.4x speedup on CPU while Xeon Phi with its AVX512 instruction set
reached 148x on PA with Vectorized (PAwV). PAwV is therefore able to scale at
least up to 1024 parallel instances on the supply chain network problem solved.
An authentication protocol based on chaos and zero knowledge proof
Journal-ref: Major, W., Buchanan, W.J. & Ahmad, J. Nonlinear Dyn (2020).
https://doi.org/10.1007/s11071-020-05463-3
Subjects:
Cryptography and Security (cs.CR)
; Distributed, Parallel, and Cluster Computing (cs.DC)
Port Knocking is a method for authenticating clients through a closed stance
firewall, and authorising their requested actions, enabling severs to offer
services to authenticated clients, without opening ports on the firewall.
Advances in port knocking have resulted in an increase in complexity in design,
preventing port knocking solutions from realising their potential. This paper
proposes a novel port knocking solution, named Crucible, which is a secure
method of authentication, with high usability and features of stealth, allowing
servers and services to remain hidden and protected. Crucible is a stateless
solution, only requiring the client memorise a command, the server’s IP and a
chosen password. The solution is forwarded as a method for protecting servers
against attacks ranging from port scans, to zero-day exploitation. To act as a
random oracle for both client and server, cryptographic hashes were generated
through chaotic systems.
Adaptive Large Neighborhood Search for Circle Bin Packing Problem
Comments: 13 pages, 6 figures, 6 tables
Subjects:
Artificial Intelligence (cs.AI)
; Distributed, Parallel, and Cluster Computing (cs.DC)
We address a new variant of packing problem called the circle bin packing
problem (CBPP), which is to find a dense packing of circle items to multiple
square bins so as to minimize the number of used bins. To this end, we propose
an adaptive large neighborhood search (ALNS) algorithm, which uses our Greedy
Algorithm with Corner Occupying Action (GACOA) to construct an initial layout.
The greedy solution is usually in a local optimum trap, and ALNS enables
multiple neighborhood search that depends on the stochastic annealing schedule
to avoid getting stuck in local minimum traps. Specifically, ALNS perturbs the
current layout to jump out of a local optimum by iteratively reassigns some
circles and accepts the new layout with some probability during the search. The
acceptance probability is adjusted adaptively using simulated annealing that
fine-tunes the search direction in order to reach the global optimum. We
benchmark computational results against GACOA in heterogeneous instances. ALNS
always outperforms GACOA in improving the objective function, and in several
cases, there is a significant reduction on the number of bins used in the
packing.
Learning
GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation
Comments: The Web Conference (WWW) 2020
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
Graph generative models have been extensively studied in the data mining
literature. While traditional techniques are based on generating structures
that adhere to a pre-decided distribution, recent techniques have shifted
towards learning this distribution directly from the data. While learning-based
approaches have imparted significant improvement in quality, some limitations
remain to be addressed. First, learning graph distributions introduces
additional computational overhead, which limits their scalability to large
graph databases. Second, many techniques only learn the structure and do not
address the need to also learn node and edge labels, which encode important
semantic information and influence the structure itself. Third, existing
techniques often incorporate domain-specific rules and lack generalizability.
Fourth, the experimentation of existing techniques is not comprehensive enough
due to either using weak evaluation metrics or focusing primarily on synthetic
or small datasets. In this work, we develop a domain-agnostic technique called
GraphGen to overcome all of these limitations. GraphGen converts graphs to
sequences using minimum DFS codes. Minimum DFS codes are canonical labels and
capture the graph structure precisely along with the label information. The
complex joint distributions between structure and semantic labels are learned
through a novel LSTM architecture. Extensive experiments on million-sized, real
graph datasets show GraphGen to be 4 times faster on average than
state-of-the-art techniques while being significantly better in quality across
a comprehensive set of 11 different metrics. Our code is released at
Pruning CNN's with linear filter ensembles
Comments: accepted to ECAI2020
Subjects:
Machine Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Despite the promising results of convolutional neural networks (CNNs),
applying them on resource limited devices is still a challenge, mainly due to
the huge memory and computation requirements. To tackle these problems, pruning
can be applied to reduce the network size and number of floating point
operations (FLOPs). Contrary to the emph{filter norm} method — that is used
in network pruning and uses the assumption that the smaller this norm, the less
important is the associated component –, we develop a novel filter importance
norm that incorporates the loss caused by the elimination of a component from
the CNN.
To estimate the importance of a set of architectural components, we measure
the CNN performance as different components are removed. The result is a
collection of filter ensembles — filter masks — and associated performance
values. We rank the filters based on a linear and additive model and remove the
least important ones such that the drop in network accuracy is minimal. We
evaluate our method on a fully connected network, as well as on the ResNet
architecture trained on the CIFAR-10 data-set. Using our pruning method, we
managed to remove (60\%) of the parameters and (64\%) of the FLOPs from the
ResNet with an accuracy drop of less than (0.6\%).
Q-Learning in enormous action spaces via amortized approximate maximization
Comments: A previous version of this work appeared at the Deep Reinforcement Learning Workshop, NeurIPS 2018
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Applying Q-learning to high-dimensional or continuous action spaces can be
difficult due to the required maximization over the set of possible actions.
Motivated by techniques from amortized inference, we replace the expensive
maximization over all actions with a maximization over a small subset of
possible actions sampled from a learned proposal distribution. The resulting
approach, which we dub Amortized Q-learning (AQL), is able to handle discrete,
continuous, or hybrid action spaces while maintaining the benefits of
Q-learning. Our experiments on continuous control tasks with up to 21
dimensional actions show that AQL outperforms D3PG (Barth-Maron et al, 2018)
and QT-Opt (Kalashnikov et al, 2018). Experiments on structured discrete action
spaces demonstrate that AQL can efficiently learn good policies in spaces with
thousands of discrete actions.
Secure and Robust Machine Learning for Healthcare: A Survey
Adnan Qayyum , Junaid Qadir , Muhammad Bilal , Ala Al-Fuqaha Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Recent years have witnessed widespread adoption of machine learning (ML)/deep
learning (DL) techniques due to their superior performance for a variety of
healthcare applications ranging from the prediction of cardiac arrest from
one-dimensional heart signals to computer-aided diagnosis (CADx) using
multi-dimensional medical images. Notwithstanding the impressive performance of
ML/DL, there are still lingering doubts regarding the robustness of ML/DL in
healthcare settings (which is traditionally considered quite challenging due to
the myriad security and privacy issues involved), especially in light of recent
results that have shown that ML/DL are vulnerable to adversarial attacks. In
this paper, we present an overview of various application areas in healthcare
that leverage such techniques from security and privacy point of view and
present associated challenges. In addition, we present potential methods to
ensure secure and privacy-preserving ML for healthcare applications. Finally,
we provide insight into the current research challenges and promising
directions for future research.
Local Policy Optimization for Trajectory-Centric Reinforcement Learning
Journal-ref: ICRA 2020
Subjects:
Machine Learning (cs.LG)
; Robotics (cs.RO); Systems and Control (eess.SY); Machine Learning (stat.ML)
The goal of this paper is to present a method for simultaneous trajectory and
local stabilizing policy optimization to generate local policies for
trajectory-centric model-based reinforcement learning (MBRL). This is motivated
by the fact that global policy optimization for non-linear systems could be a
very challenging problem both algorithmically and numerically. However, a lot
of robotic manipulation tasks are trajectory-centric, and thus do not require a
global model or policy. Due to inaccuracies in the learned model estimates, an
open-loop trajectory optimization process mostly results in very poor
performance when used on the real system. Motivated by these problems, we try
to formulate the problem of trajectory optimization and local policy synthesis
as a single optimization problem. It is then solved simultaneously as an
instance of nonlinear programming. We provide some results for analysis as well
as achieved performance of the proposed technique under some simplifying
assumptions.
Optimal binning: mathematical programming formulation
Guillermo Navas-Palencia Subjects : Machine Learning (cs.LG) ; Optimization and Control (math.OC); Machine Learning (stat.ML)
The optimal binning is the optimal discretization of a variable into bins
given a discrete or continuous numeric target. We present a rigorous and
extensible mathematical programming formulation to solving the optimal binning
problem for a binary, continuous and multi-class target type, incorporating
constraints not previously addressed. For all three target types, we introduce
a convex mixed-integer programming formulation. Several algorithmic
enhancements such as automatic determination of the most suitable monotonic
trend via a Machine-Learning-based classifier and implementation aspects are
thoughtfully discussed. The new mathematical programming formulations are
carefully implemented in the open-source python library OptBinning.
Oliver Willers , Sebastian Sudholt , Shervin Raafatnia , Stephanie Abrecht Subjects : Machine Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Deep learning methods are widely regarded as indispensable when it comes to
designing perception pipelines for autonomous agents such as robots, drones or
automated vehicles. The main reasons, however, for deep learning not being used
for autonomous agents at large scale already are safety concerns. Deep learning
approaches typically exhibit a black-box behavior which makes it hard for them
to be evaluated with respect to safety-critical aspects. While there have been
some work on safety in deep learning, most papers typically focus on high-level
safety concerns. In this work, we seek to dive into the safety concerns of deep
learning methods and present a concise enumeration on a deeply technical level.
Additionally, we present extensive discussions on possible mitigation methods
and give an outlook regarding what mitigation methods are still missing in
order to facilitate an argumentation for the safety of a deep learning method.
On Solving Cooperative MARL Problems with a Few Good Experiences
Rajiv Ranjan Kumar , Pradeep Varakantham Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cooperative Multi-agent Reinforcement Learning (MARL) is crucial for
cooperative decentralized decision learning in many domains such as search and
rescue, drone surveillance, package delivery and fire fighting problems. In
these domains, a key challenge is learning with a few good experiences, i.e.,
positive reinforcements are obtained only in a few situations (e.g., on
extinguishing a fire or tracking a crime or delivering a package) and in most
other situations there is zero or negative reinforcement. Learning decisions
with a few good experiences is extremely challenging in cooperative MARL
problems due to three reasons. First, compared to the single agent case,
exploration is harder as multiple agents have to be coordinated to receive a
good experience. Second, environment is not stationary as all the agents are
learning at the same time (and hence change policies). Third, scale of problem
increases significantly with every additional agent.
Relevant existing work is extensive and has focussed on dealing with a few
good experiences in single-agent RL problems or on scalable approaches for
handling non-stationarity in MARL problems. Unfortunately, neither of these
approaches (or their extensions) are able to address the problem of sparse good
experiences effectively. Therefore, we provide a novel fictitious self
imitation approach that is able to simultaneously handle non-stationarity and
sparse good experiences in a scalable manner. Finally, we provide a thorough
comparison (experimental or descriptive) against relevant cooperative MARL
algorithms to demonstrate the utility of our approach.
Grigori Fursin , Herve Guillou , Nicolas Essayan Subjects : Machine Learning (cs.LG) ; Software Engineering (cs.SE); Machine Learning (stat.ML)
We present CodeReef – an open platform to share all the components necessary
to enable cross-platform MLOps (MLSysOps), i.e. automating the deployment of ML
models across diverse systems in the most efficient way. We also introduce the
CodeReef solution – a way to package and share models as non-virtualized,
portable, customizable and reproducible archive files. Such ML packages include
JSON meta description of models with all dependencies, Python APIs, CLI actions
and portable workflows necessary to automatically build, benchmark, test and
customize models across diverse platforms, AI frameworks, libraries, compilers
and datasets. We demonstrate several CodeReef solutions to automatically build,
run and measure object detection based on SSD-Mobilenets, TensorFlow and COCO
dataset from the latest MLPerf inference benchmark across a wide range of
platforms from Raspberry Pi, Android phones and IoT devices to data centers.
Our long-term goal is to help researchers share their new techniques as
production-ready packages along with research papers to participate in
collaborative and reproducible benchmarking, compare the different
ML/software/hardware stacks and select the most efficient ones on a Pareto
frontier using online CodeReef dashboards.
Comments: 7 pages, 6 figures
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Existing graph neural networks may suffer from the “suspended animation
problem” when the model architecture goes deep. Meanwhile, for some graph
learning scenarios, e.g., nodes with text/image attributes or graphs with
long-distance node correlations, deep graph neural networks will be necessary
for effective graph representation learning. In this paper, we propose a new
graph neural network, namely DIFNET (Graph Diffusive Neural Network), for graph
representation learning and node classification. DIFNET utilizes both neural
gates and graph residual learning for node hidden state modeling, and includes
an attention mechanism for node neighborhood information diffusion. Extensive
experiments will be done in this paper to compare DIFNET against several
state-of-the-art graph neural network models. The experimental results can
illustrate both the learning performance advantages and effectiveness of
DIFNET, especially in addressing the “suspended animation problem”.
Journal-ref: ECMLPKDD 2019 : European Conference on Machine learning and
knowledge discovery in databases, Sep 2019, W{“u}rzburg, Germany
Subjects:
Machine Learning (cs.LG)
Conditional Generative Models are now acknowledged an essential tool in
Machine Learning. This paper focuses on their control. While many approaches
aim at disentangling the data through the coordinate-wise control of their
latent representations, another direction is explored in this paper. The
proposed CompVAE handles data with a natural multi-ensemblist structure (i.e.
that can naturally be decomposed into elements). Derived from Bayesian
variational principles, CompVAE learns a latent representation leveraging both
observational and symbolic information. A first contribution of the approach is
that this latent representation supports a compositional generative model,
amenable to multi-ensemblist operations (addition or subtraction of elements in
the composition). This compositional ability is enabled by the invariance and
generality of the whole framework w.r.t. respectively, the order and number of
the elements. The second contribution of the paper is a proof of concept on
synthetic 1D and 2D problems, demonstrating the efficiency of the proposed
approach.
Incentivising Exploration and Recommendations for Contextual Bandits with Payments
Comments: 11 pages, 4 figures
Subjects:
Machine Learning (cs.LG)
; Information Retrieval (cs.IR); Machine Learning (stat.ML)
We propose a contextual bandit based model to capture the learning and social
welfare goals of a web platform in the presence of myopic users. By using
payments to incentivize these agents to explore different
items/recommendations, we show how the platform can learn the inherent
attributes of items and achieve a sublinear regret while maximizing cumulative
social welfare. We also calculate theoretical bounds on the cumulative costs of
incentivization to the platform. Unlike previous works in this domain, we
consider contexts to be completely adversarial, and the behavior of the
adversary is unknown to the platform. Our approach can improve various
engagement metrics of users on e-commerce stores, recommendation engines and
matching platforms.
Convergence Time Optimization for Federated Learning over Wireless Networks
Mingzhe Chen , H. Vincent Poor , Walid Saad , Shuguang Cui Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI); Machine Learning (stat.ML)
In this paper, the convergence time of federated learning (FL), when deployed
over a realistic wireless network, is studied. In particular, a wireless
network is considered in which wireless users transmit their local FL models
(trained using their locally collected data) to a base station (BS). The BS,
acting as a central controller, generates a global FL model using the received
local FL models and broadcasts it back to all users. Due to the limited number
of resource blocks (RBs) in a wireless network, only a subset of users can be
selected to transmit their local FL model parameters to the BS at each learning
step. Moreover, since each user has unique training data samples, the BS
prefers to include all local user FL models to generate a converged global FL
model. Hence, the FL performance and convergence time will be significantly
affected by the user selection scheme. Therefore, it is necessary to design an
appropriate user selection scheme that enables users of higher importance to be
selected more frequently. This joint learning, wireless resource allocation,
and user selection problem is formulated as an optimization problem whose goal
is to minimize the FL convergence time while optimizing the FL performance. To
solve this problem, a probabilistic user selection scheme is proposed such that
the BS is connected to the users whose local FL models have significant effects
on its global FL model with high probabilities. Given the user selection
policy, the uplink RB allocation can be determined. To further reduce the FL
convergence time, artificial neural networks (ANNs) are used to estimate the
local FL models of the users that are not allocated any RBs for local FL model
transmission at each given learning step, which enables the BS to enhance its
global FL model and improve the FL convergence speed and performance.
Coarse-Grain Cluster Analysis of Tensors With Application to Climate Biome Identification
Derek DeSantis , Phillip J. Wolfram , Katrina Bennett , Boian Alexandrov Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
A tensor provides a concise way to codify the interdependence of complex
data. Treating a tensor as a d-way array, each entry records the interaction
between the different indices. Clustering provides a way to parse the
complexity of the data into more readily understandable information. Clustering
methods are heavily dependent on the algorithm of choice, as well as the chosen
hyperparameters of the algorithm. However, their sensitivity to data scales is
largely unknown.
In this work, we apply the discrete wavelet transform to analyze the effects
of coarse-graining on clustering tensor data. We are particularly interested in
understanding how scale effects clustering of the Earth’s climate system. The
discrete wavelet transform allows classification of the Earth’s climate across
a multitude of spatial-temporal scales. The discrete wavelet transform is used
to produce an ensemble of classification estimates, as opposed to a single
classification. Using information theory, we discover a sub-collection of the
ensemble that span the majority of the variance observed, allowing for
efficient consensus clustering techniques that can be used to identify climate
biomes.
Loss-annealed GAIL for sample efficient and stable Imitation Learning
Rohit Jena , Katia Sycara Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Imitation learning is the problem of learning a policy from an expert policy
without access to a reward signal. Often, the expert policy is only available
in the form of expert demonstrations. Behavior cloning and GAIL are two
popularly used methods for performing imitation learning in this setting.
Behavior cloning converges in a few training iterations, but doesn’t reach peak
performance and suffers from compounding errors due to its supervised training
framework and iid assumption. GAIL attempts to tackle this problem by
accounting for the temporal dependencies between states while matching
occupancy measures of the expert and the policy. Although GAIL has shown
successes in a number of environments, it takes a lot of environment
interactions. Given their complementary benefits, existing methods have
mentioned trying or tried to combine the two methods, without much success. We
look at some of the limitations of existing ideas that try to combine BC and
GAIL, and present an algorithm that combines the best of both worlds to enable
faster and stable training while not compromising on performance. Our algorithm
is embarrassingly simple to implement and seamlessly integrates with different
policy gradient algorithms. We demonstrate the effectiveness of the algorithm
both in low dimensional control tasks in a limited data setting, and in high
dimensional grid world environments.
Massif: Interactive Interpretation of Adversarial Attacks on Deep Learning
Nilaksh Das , Haekyu Park , Zijie J. Wang , Fred Hohman , Robert Firstman , Emily Rogers , Duen Horng (Polo)
Comments: 7 pages
Subjects:
Machine Learning (cs.LG)
; Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Deep neural networks (DNNs) are increasingly powering high-stakes
applications such as autonomous cars and healthcare; however, DNNs are often
treated as “black boxes” in such applications. Recent research has also
revealed that DNNs are highly vulnerable to adversarial attacks, raising
serious concerns over deploying DNNs in the real world. To overcome these
deficiencies, we are developing Massif, an interactive tool for deciphering
adversarial attacks. Massif identifies and interactively visualizes neurons and
their connections inside a DNN that are strongly activated or suppressed by an
adversarial attack. Massif provides both a high-level, interpretable overview
of the effect of an attack on a DNN, and a low-level, detailed description of
the affected neurons. These tightly coupled views in Massif help people better
understand which input features are most vulnerable or important for correct
predictions.
Improving Label Ranking Ensembles using Boosting Techniques
Lihi Dery , Erez Shmueli Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Label ranking is a prediction task which deals with learning a mapping
between an instance and a ranking (i.e., order) of labels from a finite set,
representing their relevance to the instance. Boosting is a well-known and
reliable ensemble technique that was shown to often outperform other learning
algorithms. While boosting algorithms were developed for a multitude of machine
learning tasks, label ranking tasks were overlooked. In this paper, we propose
a boosting algorithm which was specifically designed for label ranking tasks.
Extensive evaluation of the proposed algorithm on 24 semi-synthetic and
real-world label ranking datasets shows that it significantly outperforms
existing state-of-the-art label ranking algorithms.
Automatic phantom test pattern classification through transfer learning with deep neural networks
Rafael B. Fricks , Justin Solomon , Ehsan Samei Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Medical Physics (physics.med-ph)
Imaging phantoms are test patterns used to measure image quality in computer
tomography (CT) systems. A new phantom platform (Mercury Phantom, Gammex)
provides test patterns for estimating the task transfer function (TTF) or noise
power spectrum (NPF) and simulates different patient sizes. Determining which
image slices are suitable for analysis currently requires manual annotation of
these patterns by an expert, as subtle defects may make an image unsuitable for
measurement. We propose a method of automatically classifying these test
patterns in a series of phantom images using deep learning techniques. By
adapting a convolutional neural network based on the VGG19 architecture with
weights trained on ImageNet, we use transfer learning to produce a classifier
for this domain. The classifier is trained and evaluated with over 3,500
phantom images acquired at a university medical center. Input channels for
color images are successfully adapted to convey contextual information for
phantom images. A series of ablation studies are employed to verify design
aspects of the classifier and evaluate its performance under varying training
conditions. Our solution makes extensive use of image augmentation to produce a
classifier that accurately classifies typical phantom images with 98% accuracy,
while maintaining as much as 86% accuracy when the phantom is improperly
imaged.
Discovering Salient Anatomical Landmarks by Predicting Human Gaze
Comments: Accepted at IEEE International Symposium on Biomedical Imaging 2020 (ISBI 2020)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Anatomical landmarks are a crucial prerequisite for many medical imaging
tasks. Usually, the set of landmarks for a given task is predefined by experts.
The landmark locations for a given image are then annotated manually or via
machine learning methods trained on manual annotations. In this paper, in
contrast, we present a method to automatically discover and localize anatomical
landmarks in medical images. Specifically, we consider landmarks that attract
the visual attention of humans, which we term visually salient landmarks. We
illustrate the method for fetal neurosonographic images. First, full-length
clinical fetal ultrasound scans are recorded with live sonographer
gaze-tracking. Next, a convolutional neural network (CNN) is trained to predict
the gaze point distribution (saliency map) of the sonographers on scan video
frames. The CNN is then used to predict saliency maps of unseen fetal
neurosonographic images, and the landmarks are extracted as the local maxima of
these saliency maps. Finally, the landmarks are matched across images by
clustering the landmark CNN features. We show that the discovered landmarks can
be used within affine image registration, with average landmark alignment
errors between 4.1% and 10.9% of the fetal head long axis length.
A utility-based analysis of equilibria in multi-objective normal form games
Comments: Under review since 16 January 2020
Subjects:
Computer Science and Game Theory (cs.GT)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
In multi-objective multi-agent systems (MOMAS), agents explicitly consider
the possible tradeoffs between conflicting objective functions. We argue that
compromises between competing objectives in MOMAS should be analysed on the
basis of the utility that these compromises have for the users of a system,
where an agent’s utility function maps their payoff vectors to scalar utility
values. This utility-based approach naturally leads to two different
optimisation criteria for agents in a MOMAS: expected scalarised returns (ESR)
and scalarised expected returns (SER). In this article, we explore the
differences between these two criteria using the framework of multi-objective
normal form games (MONFGs). We demonstrate that the choice of optimisation
criterion (ESR or SER) can radically alter the set of equilibria in a MONFG
when non-linear utility functions are used.
AppStreamer: Reducing Storage Requirements of Mobile Games through Predictive Streaming
Comments: 12 pages; EWSN 2020
Subjects:
Operating Systems (cs.OS)
; Machine Learning (cs.LG); Machine Learning (stat.ML)
Storage has become a constrained resource on smartphones. Gaming is a popular
activity on mobile devices and the explosive growth in the number of games
coupled with their growing size contributes to the storage crunch. Even where
storage is plentiful, it takes a long time to download and install a heavy app
before it can be launched. This paper presents AppStreamer, a novel technique
for reducing the storage requirements or startup delay of mobile games, and
heavy mobile apps in general. AppStreamer is based on the intuition that most
apps do not need the entirety of its files (images, audio and video clips,
etc.) at any one time. AppStreamer can, therefore, keep only a small part of
the files on the device, akin to a “cache”, and download the remainder from a
cloud storage server or a nearby edge server when it predicts that the app will
need them in the near future. AppStreamer continuously predicts file blocks for
the near future as the user uses the app, and fetches them from the storage
server before the user sees a stall due to missing resources. We implement
AppStreamer at the Android file system layer. This ensures that the apps
require no source code or modification, and the approach generalizes across
apps. We evaluate AppStreamer using two popular games: Dead Effect 2, a 3D
first-person shooter, and Fire Emblem Heroes, a 2D turn-based strategy
role-playing game. Through a user study, 75% and 87% of the users respectively
find that AppStreamer provides the same quality of user experience as the
baseline where all files are stored on the device. AppStreamer cuts down the
storage requirement by 87% for Dead Effect 2 and 86% for Fire Emblem Heroes.
Awais Ahmed , Sufian Hameed , Muhammad Rafi , Qublai Khan Ali Mirza Subjects : Cryptography and Security (cs.CR) ; Machine Learning (cs.LG); Machine Learning (stat.ML)
Anomaly detection is a crucial step for preventing malicious activities in
the network and keeping resources available all the time for legitimate users.
It is noticed from various studies that classical anomaly detectors work well
with small and sampled data, but the chances of failures increase with
real-time (non-sampled data) traffic data. In this paper, we will be exploring
security analytic techniques for DDoS anomaly detection using different machine
learning techniques. In this paper, we are proposing a novel approach which
deals with real traffic as input to the system. Further, we study and compare
the performance factor of our proposed framework on three different testbeds
including normal commodity hardware, low-end system, and high-end system.
Hardware details of testbeds are discussed in the respective section. Further
in this paper, we investigate the performance of the classifiers in (near)
real-time detection of anomalies attacks. This study also focused on the
feature selection process that is as important for the anomaly detection
process as it is for general modeling problems. Several techniques have been
studied for feature selection and it is observed that proper feature selection
can increase performance in terms of model’s execution time – which totally
depends upon the traffic file or traffic capturing process.
Unsupervised Domain Adaptation for Neural Machine Translation with Iterative Back Translation
Comments: Submitted to IJCAI 2020
Subjects:
Computation and Language (cs.CL)
; Machine Learning (cs.LG)
State-of-the-art neural machine translation (NMT) systems are data-hungry and
perform poorly on domains with little supervised data. As data collection is
expensive and infeasible in many cases, unsupervised domain adaptation methods
are needed. We apply an Iterative Back Translation (IBT) training scheme on
in-domain monolingual data, which repeatedly uses a Transformer-based NMT model
to create in-domain pseudo-parallel sentence pairs in one translation direction
on the fly and then use them to train the model in the other direction.
Evaluated on three domains of German-to-English translation task with no
supervised data, this simple technique alone (without any out-of-domain
parallel data) can already surpass all previous domain adaptation methods—up
to +9.48 BLEU over the strongest previous method, and up to +27.77 BLEU over
the unadapted baseline. Moreover, given available supervised out-of-domain data
on German-to-English and Romanian-to-English language pairs, we can further
enhance the performance and obtain up to +19.31 BLEU improvement over the
strongest baseline, and +47.69 BLEU increment against the unadapted model.
Comments: 11 pages, 5 figures
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Natural images can be regarded as residing in a manifold that is embedded in
a higher dimensional Euclidean space. Generative Adversarial Networks (GANs)
try to learn the distribution of the real images in the manifold to generate
samples that look real. But the results of existing methods still exhibit many
unpleasant artifacts and distortions even for the cases where the desired
ground truth target images are available for supervised learning such as in
single image super resolution (SISR). We probe for ways to alleviate these
problems for supervised GANs in this paper. We explicitly apply the Lipschitz
Continuity Condition (LCC) to regularize the GAN. An encoding network that maps
the image space to a new optimal latent space is derived from the LCC, and it
is used to augment the GAN as a coupling component. The LCC is also converted
to new regularization terms in the generator loss function to enforce local
invariance. The GAN is optimized together with the encoding network in an
attempt to make the generator converge to a more ideal and disentangled mapping
that can generate samples more faithful to the target images. When the proposed
models are applied to the single image super resolution problem, the results
outperform the state of the art.
DDKSP: A Data-Driven Stochastic Programming Framework for Car-Sharing Relocation Problem
Comments: arXiv admin note: text overlap with arXiv:1909.09293
Subjects:
Optimization and Control (math.OC)
; Machine Learning (cs.LG); Signal Processing (eess.SP); Applications (stat.AP)
Car-sharing issue is a popular research field in sharing economy. In this
paper, we investigate the car-sharing relocation problem (CSRP) under uncertain
demands. Normally, the real customer demands follow complicating probability
distribution which cannot be described by parametric approaches. In order to
overcome the problem, an innovative framework called Data-Driven Kernel
Stochastic Programming (DDKSP) that integrates a non-parametric approach –
kernel density estimation (KDE) and a two-stage stochastic programming (SP)
model is proposed. Specifically, the probability distributions are derived from
historical data by KDE, which are used as the input uncertain parameters for
SP. Additionally, the CSRP is formulated as a two-stage SP model. Meanwhile, a
Monte Carlo method called sample average approximation (SAA) and Benders
decomposition algorithm are introduced to solve the large-scale optimization
model. Finally, the numerical experimental validations which are based on New
York taxi trip data sets show that the proposed framework outperforms the pure
parametric approaches including Gaussian, Laplace and Poisson distributions
with 3.72% , 4.58% and 11% respectively in terms of overall profits.
PDS: Deduce Elder Privacy from Smart Homes
Comments: 31 pages, 23 figures, and 2 tables, journal paper. arXiv admin note: text overlap with arXiv:1808.07379
Journal-ref: Internet of Things, 7, 1000072, 2019
Subjects:
Cryptography and Security (cs.CR)
; Machine Learning (cs.LG)
With the development of IoT technologies in the past few years, a wide range
of smart devices are deployed in a variety of environments aiming to improve
the quality of human life in a cost efficient way. Due to the increasingly
serious aging problem around the world, smart homes for elder healthcare have
become an important IoT-based application, which not only enables elders’
health to be properly monitored and taken care of, but also allows them to live
more comfortably and independently in their houses. However, elders’ privacy
might be disclosed from smart homes due to non-fully protected network
communication. To show that elders’ privacy could be substantially exposed, in
this paper we develop a Privacy Deduction Scheme (PDS for short) by
eavesdropping sensor traffic from a smart home to identify elders’ movement
activities and speculating sensor locations in the smart home based on a series
of deductions from the viewpoint of an attacker. The experimental results based
on sensor datasets from real smart homes demonstrate the effectiveness of PDS
in deducing and disclosing elders’ privacy, which might be maliciously
exploited by attackers to endanger elders and their properties.
Stratified cross-validation for unbiased and privacy-preserving federated learning
Comments: 13 pages, 5 figures
Subjects:
Machine Learning (stat.ML)
; Machine Learning (cs.LG); Methodology (stat.ME)
Large-scale collections of electronic records constitutes both an opportunity
for the development of more accurate prediction models and a threat for
privacy. To limit privacy exposure new privacy-enhancing techniques are
emerging such as federated learning which enables large-scale data analysis
while avoiding the centralization of records in a unique database that would
represent a critical point of failure. Although promising regarding privacy
protection, federated learning prevents using some data-cleaning algorithms
thus inducing new biases. In this work we focus on the recurrent problem of
duplicated records that, if not handled properly, may cause over-optimistic
estimations of a model’s performances. We introduce and discuss stratified
cross-validation, a validation methodology that leverages stratification
techniques to prevent data leakage in federated learning settings without
relying on demanding deduplication algorithms.
Training Neural Network Controllers Using Control Barrier Functions in the Presence of Disturbances
Shakiba Yaghoubi , Georgios Fainekos , Sriram Sankaranarayanan Subjects : Optimization and Control (math.OC) ; Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
Control Barrier Functions (CBF) have been recently utilized in the design of
provably safe feedback control laws for nonlinear systems. These feedback
control methods typically compute the next control input by solving an online
Quadratic Program (QP). Solving QP in real-time can be a computationally
expensive process for resource constraint systems. In this work, we propose to
use imitation learning to learn Neural Network-based feedback controllers which
will satisfy the CBF constraints. In the process, we also develop a new class
of High Order CBF for systems under external disturbances. We demonstrate the
framework on a unicycle model subject to external disturbances, e.g., wind or
currents.
Yang Chen Subjects : Computational Engineering, Finance, and Science (cs.CE) ; Machine Learning (cs.LG); Machine Learning (stat.ML)
Optimizing fluid-dynamic performance is an important engineering task.
Traditionally, experts design shapes based on empirical estimations and verify
them through expensive experiments. This costly process, both in terms of time
and space, may only explore a limited number of shapes and lead to sub-optimal
designs. In this research, a test-proven deep learning architecture is applied
to predict the performance under various restrictions and search for better
shapes by optimizing the learned prediction function. The major challenge is
the vast amount of data points Deep Neural Network (DNN) demands, which is
improvident to simulate. To remedy this drawback, a Frequentist active learning
is used to explore regions of the output space that DNN predicts promising.
This operation reduces the number of data samples demanded from ~8000 to 625.
The final stage, a user interface, made the model capable of optimizing with
given user input of minimum area and viscosity. Flood fill is used to define a
boundary area function so that the optimal shape does not bypass the minimum
area. Stochastic Gradient Langevin Dynamics (SGLD) is employed to make sure the
ultimate shape is optimized while circumventing the required area. Jointly,
shapes with extremely low drags are found explored by a practical user
interface with no human domain knowledge and modest computation overhead.
ESRGAN+ : Further Improving Enhanced Super-Resolution Generative Adversarial Network
Nathanaël Carraz Rakotonirina , Andry Rasoanaivo Subjects : Image and Video Processing (eess.IV) ; Machine Learning (cs.LG)
Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) is a
perceptual-driven approach for single image super resolution that is able to
produce photorealistic images. Despite the visual quality of these generated
images, there is still room for improvement. In this fashion, the model is
extended to further improve the perceptual quality of the images. We have
designed a novel block to replace the one used by the original ESRGAN.
Moreover, we introduce noise inputs to the generator network in order to
exploit stochastic variation. The resulting images present more realistic
textures.
Up to two billion times acceleration of scientific simulations with deep neural architecture search
M. F. Kasim , D. Watson-Parris , L. Deaconu , S. Oliver , P. Hatfield , D. H. Froula , G. Gregori , M. Jarvis , S. Khatiwala , J. Korenaga , J. Topp-Mugglestone , E. Viezzer , S. M. Vinko Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Computational Physics (physics.comp-ph); Plasma Physics (physics.plasm-ph)
Computer simulations are invaluable tools for scientific discovery. However,
accurate simulations are often slow to execute, which limits their
applicability to extensive parameter exploration, large-scale data analysis,
and uncertainty quantification. A promising route to accelerate simulations by
building fast emulators with machine learning requires large training datasets,
which can be prohibitively expensive to obtain with slow simulations. Here we
present a method based on neural architecture search to build accurate
emulators even with a limited number of training data. The method successfully
accelerates simulations by up to 2 billion times in 10 scientific cases
including astrophysics, climate science, biogeochemistry, high energy density
physics, fusion energy, and seismology, using the same super-architecture,
algorithm, and hyperparameters. Our approach also inherently provides emulator
uncertainty estimation, adding further confidence in their use. We anticipate
this work will accelerate research involving expensive simulations, allow more
extensive parameters exploration, and enable new, previously unfeasible
computational discovery.
Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization
Journal-ref: ECIR 2020
Subjects:
Computation and Language (cs.CL)
; Machine Learning (cs.LG)
Contextualized embeddings use unsupervised language model pretraining to
compute word representations depending on their context. This is intuitively
useful for generalization, especially in Named-Entity Recognition where it is
crucial to detect mentions never seen during training. However, standard
English benchmarks overestimate the importance of lexical over contextual
features because of an unrealistic lexical overlap between train and test
mentions. In this paper, we perform an empirical analysis of the generalization
capabilities of state-of-the-art contextualized embeddings by separating
mentions by novelty and with out-of-domain evaluation. We show that they are
particularly beneficial for unseen mentions detection, especially
out-of-domain. For models trained on CoNLL03, language model contextualization
leads to a +1.2% maximal relative micro-F1 score increase in-domain against
+13% out-of-domain on the WNUT dataset
On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation
Nicolas Brosse , Carlos Riquelme , Alice Martin , Sylvain Gelly , Éric Moulines Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)
Uncertainty quantification for deep learning is a challenging open problem.
Bayesian statistics offer a mathematically grounded framework to reason about
uncertainties; however, approximate posteriors for modern neural networks still
require prohibitive computational costs. We propose a family of algorithms
which split the classification task into two stages: representation learning
and uncertainty estimation. We compare four specific instances, where
uncertainty estimation is performed via either an ensemble of Stochastic
Gradient Descent or Stochastic Gradient Langevin Dynamics snapshots, an
ensemble of bootstrapped logistic regressions, or via a number of Monte Carlo
Dropout passes. We evaluate their performance in terms of emph{selective}
classification (risk-coverage), and their ability to detect out-of-distribution
samples. Our experiments suggest there is limited value in adding multiple
uncertainty layers to deep classifiers, and we observe that these simple
methods strongly outperform a vanilla point-estimate SGD in some complex
benchmarks like ImageNet.
Attention! A Lightweight 2D Hand Pose Estimation Approach
Comments: submitted to IEEE Signal Processing Letters
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Vision based human pose estimation is an non-invasive technology for
Human-Computer Interaction (HCI). Direct use of the hand as an input device
provides an attractive interaction method, with no need for specialized sensing
equipment, such as exoskeletons, gloves etc, but a camera. Traditionally, HCI
is employed in various applications spreading in areas including manufacturing,
surgery, entertainment industry and architecture, to mention a few. Deployment
of vision based human pose estimation algorithms can give a breath of
innovation to these applications. In this letter, we present a novel
Convolutional Neural Network architecture, reinforced with a Self-Attention
module that it can be deployed on an embedded system, due to its lightweight
nature, with just 1.9 Million parameters. The source code and qualitative
results are publicly available.
Machine Learning for Network Slicing Resource Management: A Comprehensive Survey
Comments: To appear in ZTE Communications, 2020
Subjects:
Networking and Internet Architecture (cs.NI)
; Machine Learning (cs.LG)
The emerging technology of multi-tenancy network slicing is considered as an
essential feature of 5G cellular networks. It provides network slices as a new
type of public cloud services, and therewith increases the service flexibility
and enhances the network resource efficiency. Meanwhile, it raises new
challenges of network resource management. A number of various methods have
been proposed over the recent past years, in which machine learning and
artificial intelligence techniques are widely deployed. In this article, we
provide a survey to existing approaches of network slicing resource management,
with a highlight on the roles played by machine learning in them.
On Simple Reactive Neural Networks for Behaviour-Based Reinforcement Learning
Comments: 6 pages, 5 figures
Subjects:
Robotics (cs.RO)
; Machine Learning (cs.LG)
We present a behaviour-based reinforcement learning approach, inspired by
Brook’s subsumption architecture, in which simple fully connected networks are
trained as reactive behaviours. Our working assumption is that a pick and place
robotic task can be simplified by leveraging domain knowledge of a robotics
developer to decompose and train such reactive behaviours; namely, approach,
grasp, and retract. Then the robot autonomously learns how to combine them via
an Actor-Critic architecture. The Actor-Critic policy is to determine the
activation and inhibition mechanisms of the reactive behaviours in a particular
temporal sequence. We validate our approach in a simulated robot environment
where the task is picking a block and taking it to a target position while
orienting the gripper from a top grasp. The latter represents an extra
degree-of-freedom of which current end-to-end reinforcement learning fail to
generalise. Our findings suggest that robotic learning can be more effective if
each behaviour is learnt in isolation and then combined them to accomplish the
task. That is, our approach learns the pick and place task in 8,000 episodes,
which represents a drastic reduction in the number of training episodes
required by an end-to-end approach and the existing state-of-the-art
algorithms.
Machine Learning assisted Handover and Resource Management for Cellular Connected Drones
Amin Azari , Fayezeh Ghavimi , Mustafa Ozger , Riku Jantti , Cicek Cavdar Subjects : Signal Processing (eess.SP) ; Machine Learning (cs.LG); Machine Learning (stat.ML)
Enabling cellular connectivity for drones introduces a wide set of challenges
and opportunities. Communication of cellular-connected drones is influenced by
3-dimensional mobility and line-of-sight channel characteristics which results
in higher number of handovers with increasing altitude. Our cell planning
simulations in coexistence of aerial and terrestrial users indicate that the
severe interference from drones to base stations is a major challenge for
uplink communications of terrestrial users. Here, we first present the major
challenges in co-existence of terrestrial and drone communications by
considering real geographical network data for Stockholm. Then, we derive
analytical models for the key performance indicators (KPIs), including
communications delay and interference over cellular networks, and formulate the
handover and radio resource management (H-RRM) optimization problem.
Afterwards, we transform this problem into a machine learning problem, and
propose a deep reinforcement learning solution to solve H-RRM problem. Finally,
using simulation results, we present how the speed and altitude of drones, and
the tolerable level of interference, shape the optimal H-RRM policy in the
network. Especially, the heat-maps of handover decisions in different drone’s
altitudes/speeds have been presented, which promote a revision of the legacy
handover schemes and redefining the boundaries of cells in the sky.
Adversarial Attack on Community Detection by Hiding Individuals
Comments: In Proceedings of The Web Conference 2020, April 20-24, 2020, Taipei, Taiwan. 11 pages
Subjects:
Social and Information Networks (cs.SI)
; Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
It has been demonstrated that adversarial graphs, i.e., graphs with
imperceptible perturbations added, can cause deep graph models to fail on
node/graph classification tasks. In this paper, we extend adversarial graphs to
the problem of community detection which is much more difficult. We focus on
black-box attack and aim to hide targeted individuals from the detection of
deep graph community detection models, which has many applications in
real-world scenarios, for example, protecting personal privacy in social
networks and understanding camouflage patterns in transaction networks. We
propose an iterative learning framework that takes turns to update two modules:
one working as the constrained graph generator and the other as the surrogate
community detection model. We also find that the adversarial graphs generated
by our method can be transferred to other learning based community detection
models.
Comments: 13 pages, 14 figures
Journal-ref: IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL.
5, NO. 4, DECEMBER 2019
Subjects:
Networking and Internet Architecture (cs.NI)
; Machine Learning (cs.LG)
Vehicle-to-everything (V2X) communication is a growing area of communication
with a variety of use cases. This paper investigates the problem of
vehicle-cell association in millimeter wave (mmWave) communication networks.
The aim is to maximize the time average rate per vehicular user (VUE) while
ensuring a target minimum rate for all VUEs with low signaling overhead. We
first formulate the user (vehicle) association problem as a discrete non-convex
optimization problem. Then, by leveraging tools from machine learning,
specifically distributed deep reinforcement learning (DDRL) and the
asynchronous actor critic algorithm (A3C), we propose a low complexity
algorithm that approximates the solution of the proposed optimization problem.
The proposed DDRL-based algorithm endows every road side unit (RSU) with a
local RL agent that selects a local action based on the observed input state.
Actions of different RSUs are forwarded to a central entity, that computes a
global reward which is then fed back to RSUs. It is shown that each
independently trained RL performs the vehicle-RSU association action with low
control overhead and less computational complexity compared to running an
online complex algorithm to solve the non-convex optimization problem. Finally,
simulation results show that the proposed solution achieves up to 15\% gains in
terms of sum rate and 20\% reduction in VUE outages compared to several
baseline designs.
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu , Yujia Zhai , Zizhong Chen Subjects : Computation and Language (cs.CL) ; Machine Learning (cs.LG)
Neural Network based models have been state-of-the-art models for various
Natural Language Processing tasks, however, the input and output dimension
problem in the networks has still not been fully resolved, especially in text
generation tasks (e.g. Machine Translation, Text Summarization), in which input
and output both have huge sizes of vocabularies. Therefore, input-output
embedding weight sharing has been introduced and adopted widely, which remains
to be improved. Based on linear algebra and statistical theories, this paper
locates the shortcoming of existed input-output embedding weight sharing
method, then raises methods for improving input-output weight shared embedding,
among which methods of normalization of embedding weight matrices show best
performance. These methods are nearly computational cost-free, can get combined
with other embedding techniques, and show good effectiveness when applied on
state-of-the-art Neural Network models. For Transformer-big models, the
normalization techniques can get at best 0.6 BLEU improvement compared to the
original version of model on WMT’16 En-De dataset, and similar BLEU
improvements on IWSLT 14′ datasets. For DynamicConv models, 0.5 BLEU
improvement can be attained on WMT’16 En-De dataset, and 0.41 BLEU improvement
on IWSLT 14′ De-En translation task is achieved.
Keyword-based Topic Modeling and Keyword Selection
Xingyu Wang , Lida Zhang , Diego Klabjan Subjects : Machine Learning (stat.ML) ; Information Retrieval (cs.IR); Machine Learning (cs.LG)
Certain type of documents such as tweets are collected by specifying a set of
keywords. As topics of interest change with time it is beneficial to adjust
keywords dynamically. The challenge is that these need to be specified ahead of
knowing the forthcoming documents and the underlying topics. The future topics
should mimic past topics of interest yet there should be some novelty in them.
We develop a keyword-based topic model that dynamically selects a subset of
keywords to be used to collect future documents. The generative process first
selects keywords and then the underlying documents based on the specified
keywords. The model is trained by using a variational lower bound and
stochastic gradient optimization. The inference consists of finding a subset of
keywords where given a subset the model predicts the underlying topic-word
matrix for the unknown forthcoming documents. We compare the keyword topic
model against a benchmark model using viral predictions of tweets combined with
a topic model. The keyword-based topic model outperforms this sophisticated
baseline model by 67%.
Optimal estimation of sparse topic models
Xin Bing , Florentina Bunea , Marten Wegkamp Subjects : Machine Learning (stat.ML) ; Information Retrieval (cs.IR); Machine Learning (cs.LG)
Topic models have become popular tools for dimension reduction and
exploratory analysis of text data which consists in observed frequencies of a
vocabulary of (p) words in (n) documents, stored in a (p imes n) matrix. The
main premise is that the mean of this data matrix can be factorized into a
product of two non-negative matrices: a (p imes K) word-topic matrix (A) and a
(K imes n) topic-document matrix (W). This paper studies the estimation of (A)
that is possibly element-wise sparse, and the number of topics (K) is unknown.
In this under-explored context, we derive a new minimax lower bound for the
estimation of such (A) and propose a new computationally efficient algorithm
for its recovery. We derive a finite sample upper bound for our estimator, and
show that it matches the minimax lower bound in many scenarios. Our estimate
adapts to the unknown sparsity of (A) and our analysis is valid for any finite
(n), (p), (K) and document lengths. Empirical results on both synthetic data
and semi-synthetic data show that our proposed estimator is a strong competitor
of the existing state-of-the-art algorithms for both non-sparse (A) and sparse
(A), and has superior performance is many scenarios of interest.
A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis
Comments: 31 pages, 9 figures
Subjects:
Methodology (stat.ME)
; Machine Learning (cs.LG); Machine Learning (stat.ML)
Deep learning methods are the gold standard for non-linear statistical
modeling in computer vision and in natural language processing but are rarely
used in psychometrics. To bridge this gap, we present a novel deep learning
algorithm for exploratory item factor analysis (IFA). Our approach combines a
deep artificial neural network (ANN) model called a variational autoencoder
(VAE) with recent work that uses regularization for exploratory factor
analysis. We first provide overviews of ANNs and VAEs. We then describe how to
conduct exploratory IFA with a VAE and demonstrate our approach in two
empirical examples and in two simulated examples. Our empirical results were
consistent with existing psychological theory across random starting values.
Our simulations suggest that the VAE consistently recovers the data generating
factor pattern with moderate-sized samples. Secondary loadings were
underestimated with a complex factor structure and intercept parameter
estimates were moderately biased with both simple and complex factor
structures. All models converged in minutes, even with hundreds of thousands of
observations, hundreds of items, and tens of factors. We conclude that the VAE
offers a powerful new approach to fitting complex statistical models in
psychological and educational measurement.
Comments: Accepted to IEEE Transactions on Emerging Topics in Computational Intelligence
Subjects:
Audio and Speech Processing (eess.AS)
; Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
An effective approach for voice conversion (VC) is to disentangle linguistic
content from other components in the speech signal. The effectiveness of
variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies
on this principle. In our prior work, we proposed a cross-domain VAE-VC
(CDVAE-VC) framework, which utilized acoustic features of different properties,
to improve the performance of VAE-VC. We believed that the success came from
more disentangled latent representations. In this paper, we extend the CDVAE-VC
framework by incorporating the concept of adversarial learning, in order to
further increase the degree of disentanglement, thereby improving the quality
and similarity of converted speech. More specifically, we first investigate the
effectiveness of incorporating the generative adversarial networks (GANs) with
CDVAE-VC. Then, we consider the concept of domain adversarial training and add
an explicit constraint to the latent representation, realized by a speaker
classifier, to explicitly eliminate the speaker information that resides in the
latent code. Experimental results confirm that the degree of disentanglement of
the learned latent representation can be enhanced by both GANs and the speaker
classifier. Meanwhile, subjective evaluation results in terms of quality and
similarity scores demonstrate the effectiveness of our proposed methods.
Anomaly detection in chest radiographs with a weakly supervised flow-based deep learning method
H. Shibata (1), S. Hanaoka (2), Y. Nomura (1), T. Nakao (3), I. Sato (2 and 4 and 5), N. Hayashi (1), O. Abe (2 and 3) ((1) Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, (2) Department of Radiology, The University of Tokyo Hospital, (3) Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, (4) Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, (5) Center for Advanced Intelligence Project, RIKEN) Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Preventing the oversight of anomalies in chest X-ray radiographs (CXRs)
during diagnosis is a crucial issue. Deep learning (DL)-based anomaly detection
methods are rapidly growing in popularity, and provide effective solutions to
the problem, but the workload in labeling CXRs during the training procedure
remains heavy. To reduce the workload, a novel anomaly detection method for
CXRs based on weakly supervised DL is presented in this study. The DL is based
on a flow-based deep neural network (DNN) framework with which two normality
metrics (logarithm likelihood and logarithm likelihood ratio) can be
calculated. With this method, only one set of normal CXRs requires labeling to
train the DNN, then the normality of any unknown CXR can be evaluated. The area
under the receiver operation characteristic curve acquired with the logarithm
likelihood ratio metric ((approx0.783)) was greater than that obtained with
the logarithm likelihood metric, and was a value comparable to those in
previous studies where other weakly supervised DNNs were implemented.
LRF-Net: Learning Local Reference Frames for 3D Local Shape Description and Matching
Comments: 7 pages, 9 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG)
The local reference frame (LRF) acts as a critical role in 3D local shape
description and matching. However, most of existing LRFs are hand-crafted and
suffer from limited repeatability and robustness. This paper presents the first
attempt to learn an LRF via a Siamese network that needs weak supervision only.
In particular, we argue that each neighboring point in the local surface gives
a unique contribution to LRF construction and measure such contributions via
learned weights. Extensive analysis and comparative experiments on three public
datasets addressing different application scenarios have demonstrated that
LRF-Net is more repeatable and robust than several state-of-the-art LRF methods
(LRF-Net is only trained on one dataset). In addition, LRF-Net can
significantly boost the local shape description and 6-DoF pose estimation
performance when matching 3D point clouds.
NeurOpt: Neural network based optimization for building energy management and climate control
Achin Jain , Francesco Smarra , Enrico Reticcioli , Alessandro D'Innocenzo , Manfred Morari Subjects : Systems and Control (eess.SY) ; Machine Learning (cs.LG)
Model predictive control (MPC) can provide significant energy cost savings in
building operations in the form of energy-efficient control with better
occupant comfort, lower peak demand charges, and risk-free participation in
demand response. However, the engineering effort required to obtain
physics-based models of buildings for MPC is considered to be the biggest
bottleneck in making MPC scalable to real buildings. In this paper, we propose
a data-driven control algorithm based on neural networks to reduce this cost of
model identification. Our approach does not require building domain expertise
or retrofitting of the existing heating and cooling systems. We validate our
learning and control algorithms on a two-story building with 10 independently
controlled zones, located in Italy. We learn dynamical models of energy
consumption and zone temperatures with high accuracy and demonstrate energy
savings and better occupant comfort compared to the default system controller.
Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities
Zhongruo Wang , Krishnakumar Balasubramanian , Shiqian Ma , Meisam Razaviyayn Subjects : Machine Learning (stat.ML) ; Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Optimization and Control (math.OC)
In this paper, we study zeroth-order algorithms for minimax optimization
problems that are nonconvex in one variable and strongly-concave in the other
variable. Such minimax optimization problems have attracted significant
attention lately due to their applications in modern machine learning tasks. We
first design and analyze the Zeroth-Order Gradient Descent Ascent
( exttt{ZO-GDA}) algorithm, and provide improved results compared to existing
works, in terms of oracle complexity. Next, we propose the Zeroth-Order
Gradient Descent Multi-Step Ascent ( exttt{ZO-GDMSA}) algorithm that
significantly improves the oracle complexity of exttt{ZO-GDA}. We also
provide stochastic version of exttt{ZO-GDA} and exttt{ZO-GDMSA} to handle
stochastic nonconvex minimax problems, and provide oracle complexity results.
Depth-Based Selective Blurring in Stereo Images Using Accelerated Framework
Comments: arXiv admin note: text overlap with arXiv:2001.06967
Journal-ref: 3D Research (Springer) 5, Article number: 14 (2014)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Image and Video Processing (eess.IV)
We propose a hybrid method for stereo disparity estimation by combining block
and region-based stereo matching approaches. It generates dense depth maps from
disparity measurements of only 18 % image pixels (left or right). The
methodology involves segmenting pixel lightness values using fast K-Means
implementation, refining segment boundaries using morphological filtering and
connected components analysis; then determining boundaries’ disparities using
sum of absolute differences (SAD) cost function. Complete disparity maps are
reconstructed from boundaries’ disparities. We consider an application of our
method for depth-based selective blurring of non-interest regions of stereo
images, using Gaussian blur to de-focus users’ non-interest regions.
Experiments on Middlebury dataset demonstrate that our method outperforms
traditional disparity estimation approaches using SAD and normalized cross
correlation by up to 33.6 % and some recent methods by up to 6.1 %. Further,
our method is highly parallelizable using CPU and GPU framework based on Java
Thread Pool and APARAPI with speed-up of 5.8 for 250 stereo video frames (4,096
x 2,304).
When does the Tukey median work?
Banghua Zhu , Jiantao Jiao , Jacob Steinhardt Subjects : Statistics Theory (math.ST) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
We analyze the performance of the Tukey median estimator under total
variation (TV) distance corruptions. Previous results show that under Huber’s
additive corruption model, the breakdown point is 1/3 for high-dimensional
halfspace-symmetric distributions. We show that under TV corruptions, the
breakdown point reduces to 1/4 for the same set of distributions. We also show
that a certain projection algorithm can attain the optimal breakdown point of
1/2. Both the Tukey median estimator and the projection algorithm achieve
sample complexity linear in dimension.
Weakly Supervised Temporal Action Localization Using Deep Metric Learning
Comments: accepted to WACV 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG)
Temporal action localization is an important step towards video
understanding. Most current action localization methods depend on untrimmed
videos with full temporal annotations of action instances. However, it is
expensive and time-consuming to annotate both action labels and temporal
boundaries of videos. To this end, we propose a weakly supervised temporal
action localization method that only requires video-level action instances as
supervision during training. We propose a classification module to generate
action labels for each segment in the video, and a deep metric learning module
to learn the similarity between different action instances. We jointly optimize
a balanced binary cross-entropy loss and a metric loss using a standard
backpropagation algorithm. Extensive experiments demonstrate the effectiveness
of both of these components in temporal localization. We evaluate our algorithm
on two challenging untrimmed video datasets: THUMOS14 and ActivityNet1.2. Our
approach improves the current state-of-the-art result for THUMOS14 by 6.5% mAP
at IoU threshold 0.5, and achieves competitive performance for ActivityNet1.2.
GhostImage: Perception Domain Attacks against Vision-based Object Classification Systems
Yanmao Man , Ming Li , Ryan Gerdes Subjects : Cryptography and Security (cs.CR) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
In vision-based object classification systems, imaging sensors perceive the
environment and then objects are detected and classified for decision-making
purposes. Vulnerabilities in the perception domain enable an attacker to inject
false data into the sensor which could lead to unsafe consequences. In this
work, we focus on camera-based systems and propose GhostImage attacks, with the
goal of either creating a fake perceived object or obfuscating the object’s
image that leads to wrong classification results. This is achieved by remotely
projecting adversarial patterns into camera-perceived images, exploiting two
common effects in optical imaging systems, namely lens flare/ghost effects, and
auto-exposure control. To improve the robustness of the attack to channel
perturbations, we generate optimal input patterns by integrating adversarial
machine learning techniques with a trained end-to-end channel model. We realize
GhostImage attacks with a projector, and conducted comprehensive experiments,
using three different image datasets, in indoor and outdoor environments, and
three different cameras. We demonstrate that GhostImage attacks are applicable
to both autonomous driving and security surveillance scenarios. Experiment
results show that, depending on the projector-camera distance, attack success
rates can reach as high as 100%.
Machine Learning for Performance-Aware Virtual Network Function Placement
Comments: 6 pages, 6 figures, 1 table, 9 equations, 18 references, Conference
Subjects:
Signal Processing (eess.SP)
; Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Machine Learning (stat.ML)
With the growing demand for data connectivity, network service providers are
faced with the task of reducing their capital and operational expenses while
simultaneously improving network performance and addressing the increased
connectivity demand. Although Network Function Virtualization (NFV) has been
identified as a solution, several challenges must be addressed to ensure its
feasibility. In this paper, we address the Virtual Network Function (VNF)
placement problem by developing a machine learning decision tree model that
learns from the effective placement of the various VNF instances forming a
Service Function Chain (SFC). The model takes several performance-related
features from the network as an input and selects the placement of the various
VNF instances on network servers with the objective of minimizing the delay
between dependent VNF instances. The benefits of using machine learning are
realized by moving away from a complex mathematical modelling of the system and
towards a data-based understanding of the system. Using the Evolved Packet Core
(EPC) as a use case, we evaluate our model on different data center networks
and compare it to the BACON algorithm in terms of the delay between
interconnected components and the total delay across the SFC. Furthermore, a
time complexity analysis is performed to show the effectiveness of the model in
NFV applications.
Emergence of Pragmatics from Referential Game between Theory of Mind Agents
Luyao Yuan , Zipeng Fu , Jingyue Shen , Lu Xu , Junhong Shen , Song-Chun Zhu Subjects : Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Pragmatics studies how context can contribute to language meanings [1]. In
human communication, language is never interpreted out of context, and
sentences can usually convey more information than their literal meanings [2].
However, this mechanism is missing in most multi-agent systems [3, 4, 5, 6],
restricting the communication efficiency and the capability of human-agent
interaction. In this paper, we propose an algorithm, using which agents can
spontaneously learn the ability to “read between lines” without any explicit
hand-designed rules. We integrate the theory of mind (ToM) [7, 8] in a
cooperative multi-agent pedagogical situation and propose an adaptive
reinforcement learning (RL) algorithm to develop a communication protocol. ToM
is a profound cognitive science concept, claiming that people regularly reason
about other’s mental states, including beliefs, goals, and intentions, to
obtain performance advantage in competition, cooperation or coalition. With
this ability, agents consider language as not only messages but also rational
acts reflecting others’ hidden states. Our experiments demonstrate the
advantage of pragmatic protocols over non-pragmatic protocols. We also show the
teaching complexity following the pragmatic protocol empirically approximates
to recursive teaching dimension (RTD).
EMOPAIN Challenge 2020: Multimodal Pain Evaluation from Facial and Bodily Expressions
Comments: 8 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The EmoPain 2020 Challenge is the first international competition aimed at
creating a uniform platform for the comparison of machine learning and
multimedia processing methods of automatic chronic pain assessment from human
expressive behaviour, and also the identification of pain-related behaviours.
The objective of the challenge is to promote research in the development of
assistive technologies that help improve the quality of life for people with
chronic pain via real-time monitoring and feedback to help manage their
condition and remain physically active. The challenge also aims to encourage
the use of the relatively underutilised, albeit vital bodily expression signals
for automatic pain and pain-related emotion recognition. This paper presents a
description of the challenge, competition guidelines, bench-marking dataset,
and the baseline systems’ architecture and performance on the three sub-tasks:
pain estimation from facial expressions, pain recognition from multimodal
movement, and protective movement behaviour detection.
S(^{2})OMGAN: Shortcut from Remote Sensing Images to Online Maps
X. Chen (1), S. Chen (1), T. Xu (1), B. Yin (1), X. Mei (2), J. Peng (2), H. Li (2) ((1) School of Computer Science, Wuhan University, Wuhan, 430072, China, (2) School of Geosciences and Info-Physics, Central South University, Changsha, 410083, China) Subjects : Image and Video Processing (eess.IV) ; Machine Learning (cs.LG); Machine Learning (stat.ML)
Traditional online maps, widely used on Internet such as Google map and Baidu
map, are rendered from vector data. Timely updating online maps from vector
data, of which the generating is time-consuming, is a difficult mission. It is
a shortcut to generate online maps in time from remote sensing images, which
can be acquired timely without vector data. However, this mission used to be
challenging or even impossible. Inspired by image-to-image translation
(img2img) techniques based on generative adversarial network (GAN), we propose
a semi-supervised structure-augmented online map GAN (S(^{2})OMGAN) model to
generate online maps directly from remote sensing images. In this model, we
designed a semi-supervised learning strategy to pre-train S(^{2})OMGAN on rich
unpaired samples and finetune it on limited paired samples in reality. We also
designed image gradient L1 loss and image gradient structure loss to generate
an online map with global topological relationship and detailed edge curves of
objects, which are important in cartography. Moreover, we propose edge
structural similarity index (ESSI) as a metric to evaluate the quality of
topological consistency between generated online maps and ground truths.
Experimental results present that S(^{2})OMGAN outperforms state-of-the-art
(SOTA) works according to mean squared error, structural similarity index and
ESSI. Also, S(^{2})OMGAN wins more approval than SOTA in the human perceptual
test on visual realism of cartography. Our work shows that S(^{2})OMGAN is
potentially a new paradigm to produce online maps. Our implementation of the
S(^{2})OMGAN is available at url{ this https URL }.
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
Comments: arXiv admin note: text overlap with arXiv:1909.05073
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Weight pruning has been widely acknowledged as a straightforward and
effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby
achieving acceleration on various platforms. However, most of the pruning
techniques are essentially trade-offs between model accuracy and regularity
which lead to impaired inference accuracy and limited on-device acceleration
performance. To solve the problem, we introduce a new sparsity dimension,
namely pattern-based sparsity that comprises pattern and connectivity sparsity,
and becoming both highly accurate and hardware friendly. With carefully
designed patterns, the proposed pruning unprecedentedly and consistently
achieves accuracy enhancement and better feature extraction ability on
different DNN structures and datasets, and our pattern-aware pruning framework
also achieves pattern library extraction, pattern selection, pattern and
connectivity pruning and weight training simultaneously. Our approach on the
new pattern-based sparsity naturally fits into compiler optimization for highly
efficient DNN execution on mobile platforms. To the best of our knowledge, it
is the first time that mobile devices achieve real-time inference for the
large-scale DNN models thanks to the unique spatial property of pattern-based
sparsity and the help of the code generation capability of compilers.
Towards Comparability in Non-Intrusive Load Monitoring: On Data and Performance Evaluation
Christoph Klemenjak , Stephen Makonin , Wilfried Elmenreich Subjects : Signal Processing (eess.SP) ; Machine Learning (cs.LG)
Non-Intrusive Load Monitoring (NILM) comprises of a set of techniques that
provide insights into the energy consumption of households and industrial
facilities. Latest contributions show significant improvements in terms of
accuracy and generalisation abilities. Despite all progress made concerning
disaggregation techniques, performance evaluation and comparability remains an
open research question. The lack of standardisation and consensus on evaluation
procedures makes reproducibility and comparability extremely difficult. In this
paper, we draw attention to comparability in NILM with a focus on highlighting
the considerable differences amongst common energy datasets used to test the
performance of algorithms. We divide discussion on comparability into data
aspects, performance metrics, and give a close view on evaluation processes.
Detailed information on pre-processing as well as data cleaning methods, the
importance of unified performance reporting, and the need for complexity
measures in load disaggregation are found to be the most urgent issues in
NILM-related research. In addition, our evaluation suggests that datasets
should be chosen carefully. We conclude by formulating suggestions for future
work to enhance comparability.
Information Theory
On the Capacity of Waveform Channels Under Square-Law Detection of Time-Limited Signals
Comments: Submitted to IEEE Trans. Inf. Theory, January 8, 2020
Subjects:
Information Theory (cs.IT)
Capacity bounds for waveform channels under square-law detection of
time-limited complex-valued signals are derived. The upper bound is the
capacity of the channel under (complex-valued) coherent detection. The lower
bound is one bit less, per dimension, than the upper bound.
Optimal Multistage Group Testing Algorithm for 3 Defectives
Group testing is a well-known search problem that consists in detecting of
(s) defective members of a set of (t) samples by carrying out tests on properly
chosen subsets of samples. In classical group testing the goal is to find all
defective elements by using the minimal possible number of tests in the worst
case. In this work, a multistage group testing problem is considered. Our goal
is to construct a multistage search procedure, having asymptotically the same
number of tests as an adaptive one. We propose a new approach to designing
multistage algorithms, which allows us to construct a 5-stage algorithm for
finding 3 defectives with the optimal number (3log_2t(1+o(1))) of tests.
Construction of Rate (n-1)/n Non-Binary LDPC Convolutional Codes via Difference Triangle Sets
Comments: The paper was submitted to ISIT 2020
Subjects:
Information Theory (cs.IT)
; Combinatorics (math.CO)
This paper provides a construction of non-binary LDPC convolutional codes,
which generalizes the work of Robinson and Bernstein. The sets of integers
forming an ((n-1,w))-difference triangle set are used as supports of the
columns of rate ((n-1)/n) convolutional codes. If the field size is large
enough, the Tanner graph associated to the sliding parity-check matrix of the
code is free from (4) and (6)-cycles not satisfying the full rank condition.
This is important for improving the performance of a code and avoiding the
presence of low-weight codewords and absorbing sets. The parameters of the
convolutional code are shown to be determined by the parameters of the
underlying difference triangle set. In particular, the free distance of the
code is related to (w) and the degree of the code is linked to the “scope” of
the difference triangle set. Hence, the problem of finding families of
difference triangle set with minimum scope is equivalent to find convolutional
codes with small degree.
On the Performance of Quickest Detection Spectrum Sensing: The Case of Cumulative Sum
Comments: This paper is accepted for publication in IEEE Communication Letters Jan 2020
Subjects:
Information Theory (cs.IT)
; Networking and Internet Architecture (cs.NI)
Quickest change detection (QCD) is a fundamental problem in many
applications. Given a sequence of measurements that exhibits two different
distributions around a certain flipping point, the goal is to detect the change
in distribution around the flipping point as quickly as possible. The QCD
problem appears in many practical applications, e.g., quality control, power
system line outage detection, spectrum reuse, and resource allocation and
scheduling. In this paper, we focus on spectrum sensing as our application
since it is a critical process for proper functionality of cognitive radio
networks. Relying on the cumulative sum (CUSUM), we derive the probability of
detection and the probability of false alarm of CUSUM based spectrum sensing.
We show the correctness of our derivations using numerical simulations.
Comments: 7 pages, 6 figures
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Existing graph neural networks may suffer from the “suspended animation
problem” when the model architecture goes deep. Meanwhile, for some graph
learning scenarios, e.g., nodes with text/image attributes or graphs with
long-distance node correlations, deep graph neural networks will be necessary
for effective graph representation learning. In this paper, we propose a new
graph neural network, namely DIFNET (Graph Diffusive Neural Network), for graph
representation learning and node classification. DIFNET utilizes both neural
gates and graph residual learning for node hidden state modeling, and includes
an attention mechanism for node neighborhood information diffusion. Extensive
experiments will be done in this paper to compare DIFNET against several
state-of-the-art graph neural network models. The experimental results can
illustrate both the learning performance advantages and effectiveness of
DIFNET, especially in addressing the “suspended animation problem”.
Coarse-Grain Cluster Analysis of Tensors With Application to Climate Biome Identification
Derek DeSantis , Phillip J. Wolfram , Katrina Bennett , Boian Alexandrov Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
A tensor provides a concise way to codify the interdependence of complex
data. Treating a tensor as a d-way array, each entry records the interaction
between the different indices. Clustering provides a way to parse the
complexity of the data into more readily understandable information. Clustering
methods are heavily dependent on the algorithm of choice, as well as the chosen
hyperparameters of the algorithm. However, their sensitivity to data scales is
largely unknown.
In this work, we apply the discrete wavelet transform to analyze the effects
of coarse-graining on clustering tensor data. We are particularly interested in
understanding how scale effects clustering of the Earth’s climate system. The
discrete wavelet transform allows classification of the Earth’s climate across
a multitude of spatial-temporal scales. The discrete wavelet transform is used
to produce an ensemble of classification estimates, as opposed to a single
classification. Using information theory, we discover a sub-collection of the
ensemble that span the majority of the variance observed, allowing for
efficient consensus clustering techniques that can be used to identify climate
biomes.
Physical Layer Authentication for Non-coherent Massive SIMO-Based Industrial IoT Communications
Zhifang Gu , He Chen , Pingping Xu , Yonghui Li , Branka Vucetic Subjects : Signal Processing (eess.SP) ; Cryptography and Security (cs.CR); Information Theory (cs.IT)
Achieving ultra-reliable, low-latency and secure communications is essential
for realizing the industrial Internet of Things (IIoT). Non-coherent massive
multiple-input multiple-output (MIMO) has recently been proposed as a promising
methodology to fulfill ultra-reliable and low-latency requirements. In
addition, physical layer authentication (PLA) technology is particularly
suitable for IIoT communications thanks to its low-latency attribute. A PLA
method for non-coherent massive single-input multiple-output (SIMO) IIoT
communication systems is proposed in this paper. Specifically, we first
determine the optimal embedding of the authentication information (tag) in the
message information. We then optimize the power allocation between message and
tag signal to characterize the trade-off between message and tag error
performance. Numerical results show that the proposed PLA is more accurate then
traditional methods adopting the uniform tag when the communication reliability
remains at the same level. The proposed PLA method can be effectively applied
to the non-coherent system.
欢迎加入我爱机器学习QQ14群:336582044
微信扫一扫,关注我爱机器学习公众号
微博:我爱机器学习
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK