60 COMPUTER PUBLISHED BY THE IEEE COMPUTER SOCIETY 0018-9162/15/$31.00 © 2015 IEEE
RESEARCH FEATURE
Scalable Testing of Mobile
Antivirus Applications
Andrea Valdi, dubizzle.com
Eros Lever, Secure Network S.r.l.
Simone Benefico, Moviri SpA
Davide Quarta, Stefano Zanero, and Federico Maggi, Politecnico di Milano
F
or most people, mobile devices
have become a digital wallet
that they can trust to securely
hold everything from contacts and appoint-
ments to banking and retail transactions. The growth
of smartphones, tablets, phablets, and wearables is evi-
dence that mobile devices are becoming essential to daily
life, with mobile applications limited only by developers’
imaginations. Android devices and applications, in par-
ticular, are in wider use; as of September 2015, Android
held 82.8 percent of the mobile device market,
1
with
Google Play Store reporting more than 150 billion down-
loads as of April 2015 and oering 1.6 million individual
apps as of July 2015 (www.statista.com/statistics/266210
/number-of-available-applications-in-the-google-play
-store). This number excludes applications oered exclu-
sively on alternative markets such as Amazon, Samsung
Apps, AndroLibs, and AppBrain.
The heavy reliance on mobile devices for trusted
information combined with the pervasiveness of mobile
devices and applications is attractive to cybercriminals,
who develop and distribute malware to steal sensitive
data and compromise banking and other electronic ser-
vices. Although Android has some security countermea-
sures and many antivirus (AV) applications exist, the
security community lacks a convenient and reliable way
to test them.
The main challenge of scientically evaluating AV
applications is how to reproduce the exact conditions
an AV encounters when running on a users device,
such as network connectivity, OS version, and system
load. Existing AV evaluation approaches tend to rely
on time-consuming tasks that require a high degree
of user intervention, or they ignore the need to ensure
that testing conditions are reproducible. Ideally, testing
must fully emulate the entire application stack, which
is not practical. Thus, existing methods rely on human-
assisted tests and reviews, which require heavy manual
intervention and can manage only 25 to 30 application
scans per person-day of testing.
To provide Android device users with deeper assur-
ance that their applications are secure, we developed
AndroTotal, which is based on the model that Virus Total
successfully implemented in the desktop world. Like
VirusTotal desktop users, any Android device user can
submit an application to a website to check how it is clas-
sied by commercial mobile AV products. Unlike exist-
ing methods, AndroTotal uses a completely automatic
approach to scan hundreds of suspicious applications
per day against all major AV application versions. Unlike
VirusTotal, it creates reproducible, self-contained testing
environments for each AV-malware pair, while ensuring
a high throughput because of its inherent scalability.
With AndroTotal, users can be assured that the
desired application is clean according to the best
AndroTotal, a scalable antivirus evaluation
system for mobile devices, creates reproducible,
self-contained testing environments for each
antivirus application and malware pair and
stores them in a repository, benefiting both the
research community and Android device users.
RESEARCH FEATURE
NOVEMBER 2015 61
available information to date, while
AV application developers benet from
AndroTotal’s generic, scalable approach
for mobile AV application testing,
which is openly accessible as a Web ser-
vice that conducts thorough, precise,
and repeatable tests against existing or
novel malware. Finally, researchers and
AV vendors benet from AndroTotal’s
ability to create an indexed database of
mobile malware samples, which is use-
ful in updating products and conduct-
ing more in-depth research on a partic-
ular threat.
The research community has reacted
positively to AndroTotal’s release, with
several AV vendors automating submis-
sion samples to our system and retriev-
ing samples for their own analysis. We
are also cooperating with other well-
known Android malware research proj-
ects, including CopperDroid (http://
copperdroid.isg.rhul.ac.uk) and Andru-
bis (https://play.google.com/store/apps
/details?id=org.iseclab.andrubis&hl=en).
After nearly two years of opera-
tion, we have been able to collect user
feedback and observe how researchers
employ AndroTotal. We have also taken
quantitative measurements, such as
detection rates or malware classica-
tion and misclassication rates.
AVAILABLE ANDROID
COUNTERMEASURES
The last few years have seen an alarm-
ing spike in Android malware diver-
sity
2
and a steady increase in malware
complexity.
3
Current Android secu-
rity measures to combat this malware
include code signing and signature
verication when the user installs an
app. Android’s “Verify Apps” setting
sends the hash of each installed appli-
cation to Google servers: if the hash
corresponds to known malware, a
warning is displayed. Moreover, Goo-
gle Play Store enforces developer ver-
ication and uses Bouncer, a system
that screens submitted applications.
However, studies have shown that
Bouncer can be circumvented.
4
Other
than these countermeasures, Android
device users have no protection from
malicious intrusions without down-
loading an AV application.
Users seem to understand the need
for these applications, according to
our review of Google Play Store sta-
tistics from 2013 to 2015. As Table 1
shows, AV application installations are
increasing to the 500 million mark.
However, there is no guarantee that
an AV application will work as prom-
ised. For example, in early 2014, 70 of
the 100 AV applications in Google Play
Store were Android specic, meaning
that no (more mature) desktop equiv-
alent existed and that the applications
developers were likely to be untested
market players. The lack of a known
AV company behind these applications
raises obvious questions about their
trustworthiness and eciency.
AV APPLICATION
TESTING CHALLENGES
The problem of testing an AV appli-
cation under consistent conditions
becomes much thornier if tests must
scale. In a desktop environment, testing
TABLE 1. Top 20 antivirus applications in Google
Play Store as of October 2015.
No. of installations (millions) Product
100 to 500 AVG Mobile Antivirus Free
AVAST Software Antivirus & Security
CM Security Antivirus AppLock
Security & Antivirus | Lookout
360 Security, Antivirus Free
50 to 100 Psafe Antivirus Booster & Cleaner
Antivirus Dr.Web Light
10 to 50 McAfee Antivirus & Security
Kaspersky Internet Security
Avira Antivirus Security
Norton Security and Antivirus
Trustlook Antivirus & Mobile Security
TrustGo Antivirus & Mobile Security
NQ Security Lab Antivirus Free
5 to 10 ESET Mobile Security & Antivirus
1 to 5 Malwarebytes Anti-Malware
Bitdefender Antivirus Free
Panda Free Antivirus and Security
Itus Antivirus for Android
Bitdefender Mobile Security & Antivirus
62 COMPUTER WWW.COMPUTER.ORG/COMPUTER
RESEARCH FEATURE
consists of creating scripts that auto-
mate command-line versions of the
AV program. Unfortunately, these ver-
sions are poorly maintained and inher-
ently limited because detection relies
solely on signature matching. In other
words, testing ignores behavioral heu-
ristics, or techniques triggered by net-
work communication or interprocess
communication (IPC) anomalies. Test-
ing also fails to consider the AV appli-
cations working environment, and
does not exercise the entire applica-
tion. It is no surprise that script-based
testing has been heavily criticized for
failing to reect the actual AV capabil-
ities in real-world conditions.
5
VirusTo-
tal implements the script approach, and
its authors are careful to clearly point
out these limitations (www.virustotal
.com/en/about/best-practices).
In the mobile world, the same consis-
tency and scalability challenges apply.
Several products, such as SRTAppScan,
use behavioral heuristics; others, such
as Kaspersky, analyze network-trac
anomalies. Some AV applications, such
as TrendMicro and SourceFire, use
in-cloud scanning, verifying samples
through interrogations with a remote
service, rather than against a local
database. In our view, these applica-
tions are a solid argument for develop-
ing a completely dierent approach to
mobile AV application testing.
Automating a device scan
Mobile AV applications generally have
highly interactive GUIs, making them
technically dicult to automate. Fig-
ure 1 shows the gestures needed to
perform a device scan with Zoner Anti-
virus Free. Automating even these
basic steps requires creating a pro-
gram that can emulate tapping and
then wait for the displayed results—
not a trivial undertaking. Add custom-
ized view components or other tailored
graphical elements not provided by
the Android software development kit
(SDK), and automation becomes even
more dicult.
Another scanning challenge is that
mobile AV applications work in multi-
ple detection modes, all of which must
be supported by a testing system. The
two most popular detection modes are
on demand, in which the user requests a
device scan, and on install, in which the
AV application waits for an application
programming kit (APK) to be installed—
registering a broadcast receiver for
the ACTION_PACKAGE_ADDED or ACTION
_PACKAGE_INSTALL intents—and ana-
lyzes it on the spot.
Automatically capturing detection
results is also problematic. The Android
GUI is asynchronous, which makes this
operation technically complex because
many AV applications use the notica-
tion bar, and other applications are not
allowed to capture those notications
by design.
Modifying the Android framework
or AV application might alleviate some
of these automation obstacles, but
doing so would violate the main goal of
preserving real-world working condi-
tions as much as possible.
Existing approaches
Researchers, practitioners, and ven-
dors have proposed a variety of mobile
security methodologies, many of
which are listed on the AV Compara-
tives website (www.av-comparatives
.org/mobile-security). However, they
all require time-consuming manual
tasks, including
preparing a clean testing image
(about 2 to 3 minutes per image);
installing the AV application and
the suspicious application (about
2 to 5 minutes); and
restoring a clean system state
(about 10 to 15 minutes).
For this reason, testing becomes largely
manual, with a throughput of 25 to 30
suspicious applications scanned per AV
application per person-day invested in
testing.
Although some mechanisms can
automate the creation of a clean test-
ing image and the installation of sus-
picious applications, even state-of-the-
art tools
6,7
still require an operator to
check the outcome of each scan, which
signicantly limits scalability.
Other researchers have concentrated
on building resiliency to malware
mutations by applying multiple trans-
formations (repackaging or obfusca-
tion) to the samples before testing them
with AV applications.
8
Although this
Event waiting
Tap
Tap
Screen scraping
FIGURE 1. User interaction needed to perform an on-demand device scan with Zoner
AntiVirus Free v. 1.7.0. Although the actions seem basic, they are dicult to automate.
NOVEMBER 2015 63
work is technically interesting, it does
not focus on ensuring that the tests
are run under reproducible conditions,
which we believe is essential to the sci-
entic evaluation of AV applications.
ANDROTOTAL DESIGN
A 2013 survey of mobile malware vari-
ety and complexity and an analysis of
existing AV application testing meth-
ods
9
laid the foundation for developing
AndroTotal. After establishing a set of
design requirements, we reviewed and
discarded any testing approaches that
did not meet our requirements.
Our main challenge was increasing
scalability. A possible approach was to
simultaneously install a set of AV appli-
cations and samples in a single emulator
instance or device. However, we rejected
that strategy because it can lead to
unwanted interactions between the AV
applications and between the samples,
which could result in biased outcomes.
Instead, we chose to abstract the
environment preparation and test-
ing procedure so the system could cre-
ate tests and schedule them automat-
ically, allowing us to scale the system
by simply adding computational nodes.
Because AndroTotal does not require
software modication or special hard-
ware and works in emulated environ-
ments, we could easily deploy it on cloud
infrastructures for additional scaling.
After reviewing scalability demands,
we selected six critical requirements for
AndroTotal:
1. Input stimulation on the
device’s GUI by reproducing the
typical gestures a user would
perform with an AV application.
2. Obtain feedback about the
device’s displayed views and
activities so that it can synchro-
nize testing procedures with the
running AV applications state.
In addition, scrape information
from the display to retrieve data
such as the name of an identi-
ed threat.
3. Support complex testing pro-
cedures that involve multiple
applications (an AV application
and a browser, for example), as
well as execute basic operations
such as notication manage-
ment, which still require access-
ing dierent Android applica-
tion contexts.
4. Conduct testing without AV
application modication. Any
modication to the AV package
to enable testingincluding
injecting code, changing its sig-
nature, or repackaging—might
alter its true behavior, which
could bias test results.
5. Support any Android version.
6. Natively support Android noti-
cation operations such as wait-
ing for a notication to appear,
handling an open notication,
and checking for notications,
because notications are the
only feedback for some AV
applications.
ANDROTOTAL
IMPLEMENTATION
AndroTotal supports both on-demand
and on-install detection modes, expos-
ing a Python API to automate test pro-
cedures and gather results through GUI
scraping. AndroTotal’s key dierence
from existing systems that scan mul-
tiple AV applications is that it runs the
actual AV application on a real or emu-
lated Android device. During each test,
AndroTotal captures screenshots from
the application, the network dump, and
the log le, although it does not rely on
the log le to derive the detection label
because not all mobile AV applications
log that information. An approach
based on GUI and screen scraping is
more general, exible, and less con-
strained than a log parser.
AndroTotal gives the user access
to the analyses (if any) generated by
Virus Total, CopperDroid, ForeSafe, and
SandDroid as additional information
sources. We have set up data-sharing
agreements with the VirusTotal and
CopperDroid services.
Test automation
After analyzing the six most promising,
publicly available libraries to support
testing automation,
9
we did not nd any
suitable candidates. Robotium excelled
in white-box testing, but requires appli-
cation resigning and has limited appli-
cation sandboxing, which did not meet
requirements 4 and 5. Android’s Mon-
key and Monkey runner met all require-
ments except requirement 2 because
they do not support data retrieval from
a running device or emulator. Android’s
UI Automator supports only Android
SDK API 16 or higher, which did not sat-
isfy requirement 5.
AndroidViewClient and Apk-view-
tracer were good tradeos, as both
eectively simulate typical user inputs
and can retrieve information about
displayed activities. Under the hood,
they rely on Monkey and Monkeyrun-
ner but also implement the missing
functionality to satisfy requirement 2.
They support all Android versions and
do not need any package modications
to interact with an application. How-
ever, notication support and most of
64 COMPUTER WWW.COMPUTER.ORG/COMPUTER
RESEARCH FEATURE
the other implemented functions were
unstable and slow.
Because of these shortcomings, we
decided to build and implement our own
test automation library. Andro Pilot is the
rst Android AV-application- automation
library written solely in Python that sup-
ports all the functionalities needed to
conduct AV application tests.
To build AndroPilot, which became
the foundation of AndroTotal’s scal-
able architecture, we extended Apk-
view-tracer by reimplementing part
of the existing code to improve stabil-
ity, xing several bugs and correcting
sub optimal design choices. We also
enhanced the library speed.
To correct the design choices, we
leveraged Android’s ViewServer com-
ponent and introduced new proce-
dures to properly manage application
synchronization during testing stages,
including functions that wait for an
arbitrary view, text, or notication to
appear on the screen. We also improved
view management to correctly report
when a view is shown on the running
Android instance and implemented a
new function to retrieve the screenshot
from an attached device or emulator.
Creating an AndroPilot adapter
module for an AV application was
straightforward and took little coding
eort. For the 10 adapters currently
implemented, our libraries enabled
adapter implementation with an aver-
age of 36 lines of Python code per adap-
tor (as measured with count lines of
code; http://cloc.sourceforge.net).
Figure 2 shows the code for two of
these adapter implementations.
Architecture
AndroTotal’s workow begins when a
user submits an Android application
(APK package) to AndroTotal’s Web
interface. If AndroTotal has not yet ana-
lyzed the application, it pushes the appli-
cation to the analysis queue as a series
of tasks (one for each AV application
or submitted application pair). Worker
servers execute the tasks, each of which
is treated as an execution unit, using
concurrent multiprocessing. When a
worker server receives a task, it starts
an emulator with a clean image, installs
the application sample, performs the
required tests, and stores the results in
a database.
A test is essentially a Python script
written on top of AndroPilot, which
runs a given AV application either in
on-demand or on-install mode and
retrieves the results through GUI
scraping. AndroTotal then stores the
results in its database and returns
them to the user. It also exposes a Rep-
resentational State Transfer (REST)
API, which ensures interoperability
with external services.
Ensuring scalability. We ensure
AndroTotal’s scalability by making
each testing procedure self-contained
so that a single worker server can per
-
form each test job (task) independently.
By leveraging the Android emulator’s
snapshot function, AndroTotal can run
a test in an average of 1 to 3 minutes. To
store the Android image and run the
emulator, each test requires 50 to 250
Mbytes of temporary disk space and
1 to 2 Gbytes of RAM. Once the test ter-
minates, the worker server will clear
the temporary les and ensure that the
emulator has correctly terminated and
freed the used resources. The time to
scan an application or malware sam
-
ple against a set of AV applications
grows linearly with the number of AV
applications, ensuring that paralleliza
-
tion can provide scalability.
Supporting multiple application ver
-
sions. Unlike similar services such as
VirusTotal, AndroTotal maintains mul
-
tiple versions of the same AV applica-
tion over time. In this way, it allows the
testing of new samples against older
versions as well as the computing of
evolution statistics. By accessing scan
results such as logcat, network dumps,
and screenshots, users can visualize
and download the data associated with
each AV application test. By aggregat
-
ing data from various reports, Andro-
Total also provides insight into why a
sample might be malicious.
AndroTotal adapters contain an
automated function that checks for AV
signature updates and automatically
performs them, modifying the image
as needed. These tasks are asynchro-
nous and do not aect AndroTotal’s
throughput.
New application versions are han-
dled through a semiautomated proce-
dure. AndroTotal monitors Google Play
Store each day for new AV application
releases and noties the AndroTotal
maintainers when one is found. The
maintainers use an automated script
to initialize a clean image of the new
AV release, manually test the current
AV adapter and adjust it to deal with
any changes in the application’s user
interface, and plug the new image and
adapter into the AndroTotal system.
EVALUATION
As of early October 2015, 2,491 users
have requested access to and are
actively using AndroTotal. This has
enabled us to collect 85,677 distinct
samples of malicious and benign
NOVEMBER 2015 65
class TestSuiteProductX(base.BaseTestSuite):
def detection_on_demand(self, sample_path):
“”” Test the AV’s capability of detecting malware by scanning
the whole device.
“””
# Connect to a running device
p = self.pilot
# Install application sample
p.install_package(sample_path)
def detection_on_install(self, sample_path):
“””Test the AV’s capability of detecting malware upon installation.
“””
p = self.pilot
p.install_package(sample_path)
class TestSuiteProductY(base.BaseTestSuite):
“”” Test suite for ProductY
“””
def detection_on_install(self, sample_path):
“””Test the AV’s capability of detecting malware upon installation.
“””
# Connect to a running device
p = self.pilot
if sample_path: # Install sample on the running device
p.install_package(sample_path)
time.sleep(2) # Sleep
self.__check_popup() # Check if detected
def detection_on_copy(self, sample_path):
“”” Test the AV’s capability of detecting a malware when the
sample is copied on the device.
“””
p = self.pilot
if sample_path:
p.push_file(sample_path)
time.sleep(2)
self.__check_popup()
def __check_popup(self):
“”” Check if the alert popup is displayed
“””
p = self.pilot
# Wait for the antivirus’s screen to appear
if p.wait_for_activity(
“com.kms.free.antivirus.gui.AppCheckerAlert”, 10):
# Tap on button to start scan
p.tap_on_coordinates(120, 210)
if p.wait_for_activity(“com.kms.free.antivirus.gui.AppCheckerVirusAlert”,
30, critical=False):
p.refresh()
threat_view = p.get_view_by_id(“ObjectType”)
# Extract the threat name
self.result[‘detected_threat’] = threat_view.mText.strip()
else: # No threat found
self.result[‘detected_threat’] = config.NO_THREAT_FOUND
else: # No threat found
self.result[‘detected_threat’] = config.NO_THREAT_FOUND
FIGURE . Python code for two implementations of the 10 currently implemented AndroPilot adapter modules. Our libraries enabled
adapter implementation with an average of only 36 LOC per adapter.
66 COMPUTER WWW.COMPUTER.ORG/COMPUTER
RESEARCH FEATURE
Android applications. Several research-
ers and research groups and ve major
AV product vendors have requested
access to this dataset. The data, along
with user feedback about AndroTotal
use, has revealed interesting issues,
including the need for full emulation as
well as discrepancies in the mobile and
desktop versions of AV applications.
Full versus partial emulation
One use case involved implementing
both VirusTotal and Andro Total to
evaluate the detection rate of mobile
AV applications against an advanced
proof-of-concept malware that con-
tains a call-home function to down-
load additional malicious code after
the rst execution.
4
None of the
AV tests that VirusTotal executed
triggered the call-home function,
demonstrating that VirusTotal’s test-
ing environment did not create the
necessary conditions for the malware
to work. In contrast, during Andro-
Total tests, the malware was able to
complete the procedure as if it were
working on a real device.
AV application mismatch
It is difficult to conduct a direct and
fair comparison of VirusTotal and
AndroTotal because of inherent dif-
ferences in desktop programs and
mobile applications, such as the pro-
cedure for behavioral checks and
malware detection (on the device as
opposed to in the APK).
With these comparison limitations
in mind, we automatically and manu-
ally compared VirusTotal and Andro-
Total performance on 300 randomly
chosen malware samples, focusing on
ve AV application vendors (V1–V5)
that were common to both.
For each sample and vendor, we cal-
culated the edit distance (number of
dierent characters between strings)
between the threat label that VirusTotal
and AndroTotal detected (INI:SMSSend-A
[Trj] versus Android: FakeNotify-A
[Trj], for example). Although we
expected the same vendor to label the
same samples consistently, as Figure
3 shows, only one of the ve chosen
vendors did this. This discovery vali-
dates the results of a 2011 study, which
also found naming inconsistencies in
malware.
10
In addition to showing labeling dis-
crepancies, this comparison reinforces
the need for full-emulation AV testing
approaches.
Time requirements
A worker server takes between 50 sec-
onds (s) and 5 minutes (min) to run a
test on eight malware detectors, includ-
ing the time required to launch all the
emulators. On average, a single test
takes 1 to 3 min to complete; the stan-
dard deviation is too wide to rate rela-
tive speed among vendors.
Threat labels
By querying the AndroTotal database
for the number of tests of each AV APK,
grouped by output label, we determined
the most popular threat labels.
Table 2 shows the 15 most popu-
lar threat labels detected as of early
October 2015. We assumed that pop-
ularity was proportional to the num-
ber of distinct (in terms of the MD5
message- digest algorithm) sample
APKs uploaded with that label. Several
of these labeled threats are adware,
Five vendors with desktop and mobile versions of same AV product
Threat-label discrepancy
1.00
0.75
0.50
0.25
0.00
V1 V2 V3 V4 V5
FIGURE 3. Discrepancies in threat labels between AV products offered by the same vendor.
Of five vendors that supported both a desktop and mobile version of the same AV product,
only one (V5) had nearly zero discrepancies between its two versions. Label discrepancy is
the result of dividing the edit distance between two labels by the length of the longer label
and then mapping the result to a number in an interval from 0 to 1.
NOVEMBER 2015 67
not malicious applications. It is di-
cult to dierentiate between the two.
As others have noted,
11
the boundary is
becoming increasingly gray, and den-
ing it will require further investigation.
A
ndroTotal has been well
received by users, research-
ers, and vendors, and we are
already evaluating physical parame-
ters such as battery consumption.
Although we plan to continue perform-
ing tests on physical mobile devices, it
is challenging to create self-contained,
repeatable tests on physical hardware
because of the time required to “freeze”
the device state and restore the same
testing conditions. Reashing a chosen
NAND partition on an Android device
can take from 2 to 10 min, depending
on the device model, the partition to
reash, and the need to recongure the
system after reashing is complete.
Recent hardware virtualization sup-
port of ARM processors is a possible
alternative to conducting repeatable
tests on physical hardware that will
eventually allow AndroTotal to provide
information about accurate AV appli-
cation performance on real devices.
We are also working to address
AndroPilot’s slight performance lim-
itations. At present, retrieving the dis-
played view tree takes up to 30 s for
a complete screen dump in extreme
cases—a delay due to Android’s View-
Server component. We are currently
working on patching this component by
adapting an existing but outdated patch
(http://code.google.com/p/android
-app-testing-patches), which should
yield a 20× to 40× speedup.
Because of its modular architecture,
AndroTotal could also be a valuable
starting point to build a more generic
testing framework for mobile appli-
cations, beyond the specic scope of
mobile AV applications. With an open,
noncommercial tool such as Andro-
Total, researchers can execute repeat-
able, scientically sound tests at scale,
simplifying application validation and
verication and greatly aiding the mat-
uration of AV application testing.
ACKNOWLEDGMENTS
We thank the anonymous reviewers who
signicantly contributed to the improve-
ment of our research. This research was
funded in part by the EU Seventh Frame-
work Programme (FP7/2007-2013) under
grant 257007 and by the Italian Ministry
of Education, University, and Research
through the TENACE PRIN Project
(20103P34XC).
REFERENCES
1. IDC, “Smartphone OS Market Share,
2015 Q2,” 2015; www.idc.com
/prodserv/smartphone-os-market
-share.jsp.
2. B. Uscilowski, “Security Response:
Mobile Adware and Malware Analy-
sis,” Symantec, Oct. 2013; www
.symantec.com/content/en/us
/enterprise/media/security
_response/whitepapers/madware
_and_malware_analysis.pdf.
3. ESET, “Trends for 2013: Astounding
Growth of Mobile Malware,
Dec. 2012; http://go.eset.com/us
/resources/white-papers/Trends
_for_2013_preview.pdf.
4. S. Poeplau et al., “Execute This! Ana-
lyzing Unsafe and Malicious Dynamic
Code Loading in Android Applica-
tions,” Proc. 21st Ann. Network and
Distributed System Security Symp.
(NDSS 14), Feb. 2014; doi:10.14722
/ndss.2014.23328.
5. D. Harley, “There’s Testing, Then
There’s VirusTotal,” blog, 10 Dec. 2012;
http://blog.isc2.org/isc2_blog/2012
/12/theres-testing-then-theres
-virustotal.html.
6. H. Pilz, “Building a Test Environment
for Android Anti-Malware Tests,”
Virus Bulletin Conf., Sept. 2012;
TABLE 2. Top 15 threat labels as of October 2015.
Label No. of application programming kits
UDS:DangerousObject.Multi.Generic 7,496
Android.Trojan.140227 2,745
HEUR:Trojan-SMS.AndroidOS.Opfake.bo 2,439
Adware.Airpush.origin.7 2,374
Trojan!Depositmobi.A@Android 2,302
Android.Trojan.Generic 2,176
TrojanSMS.Boxer.AQ.Gen 1,555
Trojan.FakeInst.M 1,330
HEUR: Trojan-SMS.AndroidOS.Opfake.a 1,213
Trojan!RuWapFraud.D@Android 1,204
HEUR:Trojan-SMS.AndroidOS.FakeInst.a 1,077
AndroidOS_Opfake.CTD 1,046
Adware.AndroidOS.Airpush-Gen 1,041
Android.SmsSend.origin.281 965
Android:FakeNotify-A [Trj] 947
68 COMPUTER WWW.COMPUTER.ORG/COMPUTER
RESEARCH FEATURE
www.avtest.org/leadmin/pdf
/publications/vb_2012_avtest_paper
_building_a_test_environment_for
_android_anti-malware_tests.pdf.
7. R. Ramachandran, T. Oh, and
W. Stackpole, “Android Anti-Virus
Analysis,Proc. 7th Ann. Symp. Infor-
mation Assurance and Secure Knowledge
Management (ASIA 12), 2012;
www.albany.edu/iasymposium
/proceedings/2012/11-Ramachandran
_Oh%26Stackpole.pdf.
8. M. Zheng, P.P.C. Lee, and J.C.S. Lui,
ADAM: An Automatic and Extensi-
ble Platform to Stress Test Android
Anti-Virus Systems,” Proc. 9th Int’l
Conf. Detection of Intrusions and Mal-
ware, and Vulnerability Assessment
(DIMVA 12), 2013, pp. 82–101.
9. F. Maggi, A. Valdi, and S. Zanero,
AndroTotal: A Flexible, Scalable
Toolbox and Service for Testing
Mobile Malware Detectors,” Proc. 3rd
ACM Workshop Security and Privacy in
Smartphones and Mobile Devices (SPSM
13), 2013, pp. 49–54.
10. F. Maggi et al., “Finding Nontrivial
Malware Naming Inconsistencies,”
Proc. 7th Int’l Information Systems Secu-
rity (ICISS 11), 2011, pp. 144–159.
11. LookOut, “Uncovering Privacy Issues
with Mobile App Advertising,” blog,
9 Jul. 2012; https://blog.lookout.com
/blog/2012/07/09/mobile-privacy
-app-advertising-guidelines.
ABOUT THE AUTHORS
ANDREA VALDI is a software engineer at dubizzle.com, a large classified por-
tal for the Middle East and North Africa (MENA) region. His research interests
include Android malware and antivirus solutions, Web application security, and
distributed systems. While conducting the research reported in this article, he
was a graduate student at Politecnico di Milano and lead developer of Andro-
Total. Valdi received an MSc in computer engineering from Politecnico di Milano.
Contact him at andrea.valdi@mail.polimi.it.
EROS LEVER is an information security engineer at Secure Network S.r.l. His
research interests include penetration testing and code review for both mobile
devices and Web application security, and the development of security plat-
forms and automated tools for security analysis. While conducting the research
reported in this article, he was a graduate student at Politecnico di Milano. Lever
received an MSc in computer engineering from Politecnico di Milano. Contact him
at e.lever@securenetwork.it.
SIMONE BENEFICO is an IT security analyst at Moviri SpA. His research inter-
ests include application security, intrusion-detection systems, and side-channel
attacks on mobile devices. While conducting the research reported in this arti-
cle, he was a graduate student at Politecnico di Milano. Benefico received an
MSc in computer engineering from Politecnico di Milano. Contact him at simone
DAVIDE QUARTA is a PhD student in the Department of Electronics, Information,
and Bioengineering at Politecnico di Milano and the current lead developer of
AndroTotal. His research interests include mobile malware, embedded systems
security, and exploitation techniques. Quarta received an MSc in computer engi-
neering from Politecnico di Torino. Contact him at davide.quarta@polimi.it.
STEFANO ZANERO is an associate professor in the Department of Electron-
ics, Information, and Bioengineering at Politecnico di Milano. His research inter-
ests include systems security, computer virology, and applications of machine
learning to security data. Zanero received a PhD in computer engineering from
Politecnico di Milano. He is a Senior Member of IEEE and ACM, a member of the
Board of Governors of the IEEE Computer Society, a member of the international
board of directors of the Information Systems Security Association (ISSA), and an
ISSA Fellow. Contact him at [email protected].
FEDERICO MAGGI is an assistant professor in the Department of Electronics,
Information, and Bioengineering at Politecnico di Milano. His research interests
include the analysis of Android malware, botnets, Web services, and social net-
work abuses; anomaly-based intrusion detection; and machine-learning-based
approaches to detect unexpected network payloads. Maggi received a PhD in
computer engineering from Politecnico di Milano. He is a member of ACM, gen-
eral chair of the 2015 International Conference on the Detection of Intrusions and
Malware and Vulnerability Assessment (DIMVA 15), and a peer reviewer for IEEE
Transactions on Dependable and Secure Computing. Contact him at federico.
Selected CS articles and
columns are also available for
free at http://ComputingNow
.computer.org.