Nonnegative Matrix Factorization Under Adversarial Noise
Peter Ballen, Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA
Nonnegative Matrix Factorization (NMF) is a popular tool to estimate the missing entries of a dataset under the assumption that the true data has a low-dimensional factorization. One example of such a matrix is found in movie recommendation settings, where NMF corresponds to predicting how a user would rate a movie. Traditional NMF algorithms assume the input data is generated from the underlying representation plus mean-zero independent Gaussian noise. However, this simplistic assumption does not hold in real-world settings that contain more complex or adversarial noise. We provide a new NMF algorithm that is more robust towards these nonstandard noise patterns. Our algorithm outperforms
existing algorithms on movie rating datasets, where adversarial noise corresponds to a group of adversarial users attempting to review-bomb a movie.
Nonnegative Matrix Factorization, Adversarial Noise, Recommendation
Data Model for Bigdeepexaminator
Janusz Bobulski and Mariusz Kubanek, Department of Computer Science, Czestochowa University of Technology, Poland
Big Data is a term used for such data sets, which at the same time are characterized by high volume, diversity, real-time stream inflow, variability, complexity, as well as require the use of innovative technologies, tools and methods in order to extracting new and useful knowledge from them. Big Data is a new challenge and information possibilities. The effective acquisition and processing of data will play
a key role in the global and local economy as well as social policy and large corporations. The article is a continuation of research and development works on the design of the data analysis system using artificial intelligence, in which we present a data model for this system.
Big data, intelligent systems, data processing, multi-data processing
Are you Asking the Same Question?
Pooja Bihani1 and Ashay Walke2, 1Fidelity Investments, Bangalore, India and 2Accenture Applied Intelligence, Bangalore, India
Question-answer platforms are trending over the internet nowadays. Clustering questions which ask the same question is a challenging problem faced by such platforms. We will discuss and implement techniques that . We will do this by finding the semantic relationship between the questions. Manhattan Long short-term memory model with Area Under ROC Curve 0.81 is the model which performs the best among all the trained and pre-trained models which we will be discussing in this paper.
Sentiment analysis, Sentence match, Natural Language Processing, Machine Learning
Second Order Pseudolikelihood Learning in Relational Domain
Krishna Kumar Tiwari, V. Vijaya Saradhi, Indian Institute of Technology Guwahati, Guwahati, India
We use composite likelihood for structure learning and parameter estimation in relational dependency networks (RDNs). RDNs currently use pseudolikelihood, to learn parameters, which is a special case of composite likelihood function. Composite likelihood learning is used to give trade-off between computational complexity and performance of the model. Variance of the model is minimum in case of full likelihood and maximum in pseudolikelihood.
In particular we focus on modified second order pseudolikelihood function and extend relational Bayesian classifier (RBC) to this setting. Second order RDNs explore pairwise attribute correlation. We evaluate second order learning on synthetic and real world data sets. We observe experi- mentally second order model has an edge over the pseudolikelihood based model particularly in highly correlation environment.
Visualized Estimation of Word Frequency for Second Vocabulary Acquisition
Yo Ehara, Shizuoka Institute of Science and Technology, Japan
To learn vocabulary in second languages, analyzing the words known by learners is important. We cannot test all the words in the language vocabulary knowledge of each learner before obtaining it, as a second language vocabulary is large. Hence, estimating the difficulty of words for learners by using only the result of tests of their small vocabulary is preferable. To this end, the word frequency in a balanced corpus, whose frequencies are not biased towards some domains, has been reported to work well. However, manually balancing the corpus requires time and cost. Moreover, learners do not always want to learn vocabulary in a balanced manner; they sometimes want to learn words of a specific domain. This paper proposes a novel visualization method for intuitive vocabulary learning. Our method first semantically visualizes each occurrence, or usage, of a word as a point in a two-dimensional space. Then, our method improves the visualization so that the frequency, i.e., the number of points of the central region in the visualization, fits the learners' vocabulary test result. The experimental results reveal thatthe frequency counted by our method can better predict learners' vocabulary test responses than raw frequencies.
Second Vocabulary Learning, Supervised Visualization, Contextualized Word Embeddings
Using LoRa Technology to Determine the Location of a Bus in Real-Time
Ronald Tumuhairwe, Department of Engineering, Ashesi university, Eastern region, Ghana
Travelers without prior information about the bus they are waiting for, often waste time at bus stations which affects their daily activity plan. GPS receivers are used to address this problem by providing real-time information about the location of the bus, however, it is quite expensive and power consuming to use and every bus needs to have a GSM/GPRS module to be able to send the information to the internet. This paper suggests a low-cost and less power consuming approach of using LoRa technology, geometry methods to determine the location of the bus and only one GSM module for multiple buses to transmit the data to the internet. Results in the project show promising achievement of determining the location of the bus in realtime.
SLoRa, Trilateration, Real-Time, Tracking, Geometry
An efficient algorithm to find the height of a text line and overcome overlapped and broken line problem during segmentation
Sanjibani Pattanayak, Sateesh Kumar Pradhan, Ramesh Chandra mallick, Utkal University, India
Line segmentation is one of the critical phases of the character recognition process that separates the individual lines from the image document. The accuracy rate of the character recognition is directly proportional to the line segmentation accuracy which is followed by word/character segmentation. Here, an algorithm, named height_based_segmentation algorithm is proposed for the text line segmentation of printed Odia documents. This algorithm finds the average height of a text line based on which it minimizes the overlapped text line cases. A post-processing step is included in the algorithm that combines the modifier zone with the base zone that has been separated during the segmentation process, with the base zone. The performance of the algorithm is evaluated with the ground truth and also by comparing it with the existing segmentation approaches. A database has been built with the segmented lines which will be helpful for researchers who work in word or character segmentation field.
Document Image Analysis, Line segmentation, word segmentation, Database creation, printed Odia document
Neighbour Alpha Stable Fusion in Wavelet Decomposition and Laplacian Pyramid
Rachid Sabre1 and Ias Wahyuni2, 1Laboratory Biogéosciences CNRS, University of Burgundy/Agrosup Dijon, France and 2Universitas Gunadarma, J1. Margonda Raya No 100 Depok 16424 Indonesia
In this paper, a new multifocus image fusion method is proposed, which combines the Laplacian pyramid, wavelet decomposition and uses alpha stable distance as a selection rule. First, using Laplacian pyramid, we decomposed the multifocus images into several levels of pyramid and then applied wavelet decomposition at each level. The contribution of this work is to fuse the wavelet images at each level by using alpha stable distance as a selection rule. To get the final fused image, we reconstructed the combined image at every level of the pyramid. This protocolwas compared to other methods and showed good results.
Image fusion, Laplacian pyramid, Wavelet decomposition
Solar Potential Analysis of Rooftops Using Satellite Imagery
Akash Kumar, Delhi Technological University, New Delhi, India
Solar energy is one of the most important sources of renewable energy and the cleanest form of energy. In India, where solar energy could produce power around trillion kilowatt-hours in a year, our country is only able to produce power of around in gigawatts only. Many people are not aware of the solar potential of their rooftop, and hence they always think that installing solar panels is very much expensive. In this work, we introduce an approach through which we can generate a report remotely that provides the amount of solar potential of a building using only its latitude and longitude. We further evaluated various types of rooftops to make our solution more robust. We also provide an approximate area of rooftop that can be used for solar panels placement and a visual analysis of how solar panels can be placed to maximize the output of solar power at a location.
Rooftop Detection, Solar Panels, Adaptive Canny Edge, Gabor Filter, Image Segmentation, Object Detection.
Predicting American Sign Language from Hand Gestures using image processing and deep learning
Sowmya Saraswathi and Suresh Kumar N, Department of Computer Science Engineering VIT University, Vellore, India
The use of Computer vision with machine learning techniques resulted in breakthrough in medical and social fields. One such is the Sign language detection enabling communication for differently abled people. Hand gestures is the common method used for this. This paper deals with using image processing techniques like RGB to Binary image conversion, skin detection, edge detection to extract the important features of the gesture and provide them
as inputs to Convolution Neural nets to improve the accuracy in predicting American Sign language alphabets(ASL).Given the image of the hand gesture we will be able to predict the alphabet with good accuracy.
Image processing, Edge detection, Convolution neural nets, American Sign Language
Newly Proposed Adaptive Digital Signal Processing Algorithm for Noise Cancellation
Shiv Ram Meena and Chandra Shekhar rai, Guru Gobind Singh Indraprastha University, India
In this paper a new adaptive digital signal processing algorithm is proposed, which works on the principle of gradient search. To develop it, a new cost function is constructed, that contains the combined properties of mean square error and weighted square error. Then algorithm is developed similar to least mean square from this cost function. This algorithm has the characteristics of two algorithms, one is least mean square (LMS) and second is recursive least square (RLS). It is aimed to combined the cost functions of LMS and RLS so, new algorithm conserves the properties both algorithms, so it should be simple to implement and fast converging. To observe the performance of this algorithm, matlab simulation is carried out for noise cancellation and simulated results of newly developed algorithm were compared with least mean square (LMS) algorithm because LMS is most popular algorithm in adaptive digital signal processing. For simulation matlab code is written for both the algorithms. It is observed from the results that new algorithm is performing better than LMS.
Adaptive noise cancellation, LMS, new adaptive digital signal processing algorithm, cost function, Adaptive filtering.
Amharic-Arabic Neural Machine Translation
Ibrahim Gashaw1 and HL Shashirekha2, 1Mangalore University Department of Computer Science Mangalagangotri, Mangalore-574199, 2Mangalore University Department of Computer Science Mangalagangotri, Mangalore-574199
Many automatic translation works have been addressed between major European language pairs, by taking the advantage of large scale parallel corpora, but very few research works are conducted on the Amharic- Arabic language pair due to its parallel data scarcity. Two Long Short- Term Memory (LSTM) and Gated Recurrent Units (GRU) based Neu- ral Machine Translation (NMT) models are developed using Attention- based Encoder-Decoder architecture which is adapted from the open- source OpenNMT system. In order to perform the experiment, small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic lan- guage text corpora available on Tanzile. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system.
Amharic, Arabic, Neural Machine Translation, OpenNMT.
Regression Testing for Contoso
Nakush Sharma1 and Shahid Ali2, 1Department of Information Technology, AGI Institute, Auckland, New Zealand and 2Department of Information Technology, AGI Institute, Auckland, New Zealand
This project work involves the automation regression testing of Microsoft Dynamics 365(MD365) enterprise resource planning (ERP) which is cloud based solution for Contoso. Contoso has problem to access and monitor the huge database of customers, vendors and products on traditional ERP system. Therefore, they have started to use the cloud based ERP system. The reason of this report is that there are constant updates in MD365. They also wanted to add new fields in customer, vendor and product web forms. Therefore, automation regression testing is essential for this project. Selenium web driver is selected for this type of testing. This project will help the Contoso to achieve the regression automation testing by Selenium web driver for MD365.The testing will help them to execute repeat test cases without any duplicity of the code. The report generation of the execution result will help them to understand the progress of the project. Moreover, Scrum methodology is adopted by the organization which is very flexible and helps them to provide the customer satisfaction. It is easy to adapt the requirement changes in scrum methodology.
Automation Testing, Regression testing, Microsoft Dynamics (MD 365), Selenium, Scrum
Wcag 2.0 Accessibility Test Approach by using Rest Assured Framework
Abhay R. Palaskar, PhD Contd, Bharthiar University (Coimbatore, TN) ,Program Lead, Atos Syntel,Memphis, TN USA
it is not that easy to be accessibility compliant. Before anyone can think about making their website accessible to all the disabilities, they have to think about giving 100% commitment to making their website accessible, learning how to implement accessibility is not an easy task and it required knowledge of all sorts’ disabilities and implementation guidelines. Understand the legal obligation part of it that too before and after implementation of the accessibility for your website. ADA tools are capable of only detecting around 50 to 60% of ADA guidelines for its success. It is an observation that the rules used to identify errors are not well written and complex to work upon. Even if the tool shows 100% success, it does not guarantee to detect all possible violations. It is difficult to say that one tool does all for the ADA since all the tools are specify to certain disabilities and no tool covers entire criteria at once. There is one more major issue associated with the overall process of ADA certification and that is “Where to start?” Website analysis is critical to understand how much work needs to be done, tools often cost too many dollars and give complex information which is confusing to the developers without ADA knowledge. Therefore, we started working on a solution to put a strategy to test website and provide simple information by gathering web services response from the websites. The test strategy will be used by any website or e-commerce business owner planning to implement ADA guidelines to the website and without any prior knowledge on it.
Web Content Accessibility Guidelines (WCAG), ADA (Americans with the Disabilities Act), tools to test ADA guidelines
Bitcoinmining: Electronic Cash system
A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power. As long as a majority of CPU power is controlled by nodes that are not cooperating to attack the network, they'll generate the longest chain and outpace attackers. The network itself requires minimal structure. Messages are broadcast on a best effort basis, and nodes can leave and rejoin the network at will, accepting the longest proof-of-work chain as proof of what happened while they were gone.
Click Volume Potential Maximization in Affiliate Network
Krishna Kumar Tiwari and Ritesh Ghodrao, InMobi Technology Services Pvt Ltd, Bangalore, India
An affiliate network is all about running advertiser’s campaign (acquire new user, download campaign) on multiple/chain of ad-tech companies (aka affiliates), most of the affiliate in affiliate network deals with a huge volume of clicks (pretty much 500M to 1.5B roughly, with click QPS varying from 10K to 25K). Only a small fraction for clicks leads to conversions which leads to revenue to affiliate but hosting a huge volume of clicks costs a lot based on engineering setup. The real challenge here is that we need to maintain the profit after paying for the infra cost, hence it becomes mandatory to optimize on infra cost and revenue equation. In this paper, we have presented a unique way of modeling the Infra-to-Revenue equation based on click volume and provided a Knapsack way of solving the Infra-to-Revenue equation and maximising our revenue by keeping a constraint on infra cost, which we are calling as CVPM (click volume potential maximization). We have compared CVPM with greedy based optimizations and concluded that CVPM outperforms many of these approaches in most of the real scenarios.
Infra cost optimization, Click optimization, 0-1 Knapsack, Adtech optimizations