Here is a list of the sample student projects. These projects will be conducted in the supervision of the faculty, PhD/MS students, postdoctoral fellow involved in the NSF REU project. 
Privacy-Preserving Information Retrieval
          Information retrieval (IR) plays an essential role in daily life. However, currently deployed IR technologies are insufficient when the information is protected or deemed to be private. For example, submitting a query to a publicly available search engine (e.g., Bing or Google) requires disclosing potentially delicate facts (e.g., thoughts about abortion), as well as the websites the user considers interesting. Similarly, when a private database contains sensitive information needed by the user, it cannot be searched freely. Over the past decade, various approaches, generally referred to as private information retrieval (PIR), have been proposed to obfuscate queries and responses, but they are limited in that the retrieved information is inadequate to compute relevancy. To address such limitations, this project investigates the necessary techniques to build a framework that allows one party to discover whether a second party harbors any relevant textual information without either party disclosing any information.
Making Multi-authority Attribute Based Encryption more flexible
          Attribute based encryption (ABE) stands out as a promising cryptographic tool to provide flexible and fine-grained access control. In ABE (specially CP-ABE), it is possible to realize any access policy in terms of a set of descriptive attributes and decryption is possible by a key if attributes in the key satisfies the access policy. One problem with ABE is the key escrow problem which happens due to the fact that only one party (Attribute authority) owns the master secret key. So, attribute authority can create any decryption key and have the ability to decrypt anything. Multi authority ABE (MA-ABE) was proposed to overcome the key escrow problem of ABE. In MA-ABE, instead of having a single attribute authority, multiple attribute authorities exist and each attribute authority controls their own distinctive set for attributes. No single attribute authority has the ability to create decryption keys for attributes other than those administrated by itself. But, MA-ABE solves key escrow problem with the cost of losing flexibility. All the existing MA-ABE proposed so far has the restriction that any access policy as well as decryption key should contain at least one attribute from each attribute authority. It results in bigger ciphertext size. On the other hand, one has to be more knowledgeable about which attribute is owned by which attribute authority. This can be a problem in large a scale system like cloud. So, in this research, our goal would be to get rid of all or some of these restrictions and make MA-ABE more flexible.
Online Trust Evaluation Framework
          Cloud computing platforms in the past decade has seen a growing acceptance amongst users to host their applications. This is because of its capability to cut down financial costs and relieve the users from the hassle of acquiring and maintaining required resources. A pay-as- you-model along with the automated options of scaling up and down also adds on to the popularity of Cloud computing platforms. Nevertheless amidst these advantages, users are wary about fully adopting the cloud computing platform because of their primary concerns about the security of their application on the cloud computing platform. To address these concerns, past researches have proposed an “offline risk assessment of cloud service provider” framework which assesses the security provided by a cloud service provider with respect to the security threats present in a user’s application. It then formulates a cloud migration plan by performing a cost-benefit tradeoff analysis to select suitable cloud service provider to host an application.
          Once the user has adopted the proposed cloud migration plan and begins to use the services of the cloud provider they will need well defined ways to gauge the trustworthiness of the provider and be able to compare them against other available cloud providers. This concept is to be captured by an “online trust evaluation framework”. Trusts between users and cloud providers always changes due to not only by personal user experiences and recommendations from other users, but also time strategy and its effect on parameters used for trust evaluation. These kinds of scenarios require a dynamic trust evaluation and decision-making framework, where Service Level Agreements and Operational Level Agreements are of little use. Therefore, the objective of the “online trust evaluation framework” is to assess the trust of a cloud provider based on its subjective (e.g. user feedback and reputation of cloud providers) and objective factors (e.g. availability, response time, throughput of services).
          Students will work on developing a web application, crawling the web for information related to services provided by different available cloud service providers and integrating the aforementioned using the principles of Bayesian networks to output a comprehensive quantitative assessment of the whole scenario.
Technologies involved:
  • Front End
    • HTML5, CSS3, Basic Javascipt
    • optional: HTML/CSS framework: Bootstrap, Javascipt framework: React, Angular
  • Backend/server side
    • Node.js
    • Python (Django or Flask frameworks)
  • Databases
    • MySQL, PostgreSQL, (MongoDB; only if Node.js is used)
  • Optional
    • Implementing web application using APIs/REST services
[1] S. Madria and A. Sen, “Offline risk assessment of cloud service providers,” IEEE Cloud Computing, vol. 2, no. 3, pp. 50–57, May 2015.
[2] Ang Li, Xiaowei Yang, Srikanth Kandula, and Ming Zhang. “CloudCmp: comparing public cloud providers”. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement (IMC '10). ACM, New York,USA, 2010.
[3] Khaled M. Khan, Qutaibah Malluhi, "Establishing Trust in Cloud Computing," IT Professional, vol. 12, no. 5, pp. 20-27, September/October, 2010.
[4] Mohamed Firdhous, Osman Ghazali, Suhaidi Hassan, “Trust Management in Cloud Computing: A Critical Review”, International Journal on Advances in ICT for Emerging Regions 2011 04 (02) : 24 – 36, 2012.
Secure Multi-Party Big Trajectory Data Mining
          With the proliferation of mobile devices and location-based services, location data has been extensively collected by the service providers. The service providers are usually interested in learning users’ movement patterns so as to provide more proactive services. It would be more attractive that multiple service providers share their data for such analysis. However, it may violate privacy terms if the service providers share the users’ data directly. The possible solution is to conduct such collaborative analysis in a privacy-preserving way. Therefore, in this project, we have two main tasks:
  • Design a MapReduce-based data mining algorithm to discover popular travel routes in a large-scale trajectory dataset.
  • Enhance the previous MapReduce-based algorithm to conduct the secure multiparty data mining among multiple large-scale trajectory datasets.
Secure and Privacy-preserving Data Analysis
          Privacy-preserving data analysis algorithms or techniques play an important role in enabling quick and efficient decision making without compromising user data or identity. These techniques are essential today and in future too, when there are wide concerns when it comes to cyber or information security. Privacy-preserving techniques are generally used on sensitive data, prior to data mining, so as to not disclose a user's or an organization's privacy. These techniques apply transformations on data, mostly, by either reducing the data granularity in order to preserve privacy for example using randomization techniques, distributed privacy-preservation techniques or using models such as k-anonymity etc. The project concerns areas such as data privacy and information retrieval. We use various secure multi-party computation protocols, secure data representation techniques etc. to obtain analysis results. The examples of these multi-party computation protocols are secure vector product calculations, top-k query results computation etc. The examples of secure data representation techniques can involve use of various models to represent textual data such as the Term-Frequency-Inverse Document Frequency model or the n-gram model etc. as well as to use the appropriate encryption/decryption methods.
Privacy Preserving Big Data Publishing In the Cloud
         With the proliferation of mobile devices and location-based services, location privacy has become an increasing concern from users. Without the proper protection, the trajectories of location-based service users may be tracked by the service providers, which may disclose one's sensitive information such as hospital visits and religion. In this project, we aim to study how to leverage the MapReduce technology to anonymize the large amount of location data collected by various kinds of mobile applications, in order to efficiently and effectively preserve users' location privacy.

In this project, we have two main tasks:

  • Design MapReduce-based algorihtm to anonymize a large amount of location data, in order to efficiently and effectively preserve users' location privacy.
  • Design an evaluation tool to quickly evaluate the data utility of anonymized data and a visualization tool to visualize anonymized trajectories.
Access Control Delegation in the Cloud
         Cloud computing enables a new form of service in that a service can be realized by components provided by different enterprises or entities in a collaborative manner. Specifically, a compound cloud service typically involves multiple cloud service providers who are loosely connected. Each provider is responsible for managing and protecting resources/data entrusted to them. Our objective is to achieve federated security services focused on access control and delegation while preserving autonomy and privacy sharing preferences of involved parties. Specifically, in this project, we will study how to decompose a global access control policy into local policies based on the hierarchical relationship among the service participants.
Replicated Data Integrity Verification
         Here, students will learn about replicated data verification schemes and performance issues in checking the data integrity of replicated copies stored in a cloud platform. They will also perform experiments. Data replication is a commonly used technique to increase the data availability in cloud computing. Cloud replicates the data and stores them strategically on multiple servers located at various geographic locations. Since the replicated copies look exactly similar, it is difficult to verify whether the cloud really stores multiple copies of the data. Cloud can easily cheat the owner by storing only one copy of the data. Thus, the owner would like to verify at regular intervals whether the cloud indeed possesses multiple copies of the data as claimed in the SLA (Service level Agreements). In general, cloud has the capability to generate multiple replicas when a data owner challenges the CSP to prove that it possesses multiple copies of the data. Also, it is a valid assumption that the owner of the data may not have a copy of the data stored locally. Thus, the major task of the owner is not only to verify that the data is intact but also to recover if any deletions/corruptions of data are identified. If the data owner during his verification using proposed scheme detects some data loss in any of the replicas in the cloud, he can recover the data from other replicas that are stored intact. Since, the replicas are to be stored at diverse geographic locations, it is safe to assume that a data loss will not occur at all the replicas at the same time.
Fine Grained Access Control of Owner's Data Via Cloud
         Data owners may not have sufficient information about the underlying data privacy and security mechanisms used by the CSP in order to determine how well their data is protected on the cloud. An interesting question is, can the data owners ensure the security of their own data by some means? Encryption is a useful tool for protecting the confidentiality of sensitive data so that the data remains protected even after the database has been successfully attacked or stolen. Provided that the encryption is done properly and the decryption keys are not also accessible to the attacker, encryption can provide protection for the sensitive data stored in the database, reduce the legal liability of the data owners, and reduce the cost to society of fraud and identity theft. However, with the data being in encrypted form, there remain issues such as preventing user access to unauthorized fields, efficiently revoking users’ privileges without re-encrypting massive amounts of data and re-distributing the new keys to the authorized users, handling collusion between users and cloud service providers, and issuing changes to a user’s access privileges. We will investigated efficient methods for handling user access rights, revoking those rights efficiently, and issuing either same or different access rights to a returning user. We also address the issues of security from a “curious” cloud, collusion among users, and collusion between a user and the cloud service provider. Students will learn the above schemes, run experiments and improve them.
Secure Comparison between Encrypted Values
         Here students will learn how to compare two integers when they are encrypted. They will also learn several applications related to the secure comparison protocol. This is a very common operation performed on the encrypted outsourced data for different types of query processing.
           Let E be an encryption function of a public key homomorphic encryption scheme. Suppose there are two entities: Alice and Bob where Bob has two encrypted values E(x) and E(y) and Alice has the decryption key of E. Both Alice and Bob do not know x and y. A secure comparison (SC) protocol is defined as: E(x) and E(y) as the input, and it returns E(1) if x > y and returns E(0) otherwise. This SC protocol has a number of applications related data analytics over encrypted data, such as range query, k-nearest neighbor query, clustering and classification over encrypted data stored on a cloud. The existing homomorphic encryption based SC protocols require encryptions of individual bits as inputs rather than simple encrypted integers. Therefore, the space cost is high. Along this direction, we objective is to develop an SC protocol that is more efficient than existing methods. In our algorithm [SHJ13], we introduced a probabilistic approach into the design of an SC protocol. The design of our probabilistic algorithm is based on following observation: Let x0, y0 and r0 be and least significant bits of x, y and r respectively. For any given y = x + r mod N (where N is odd), x0 = y0 xor r0 if x + r < N (i.e., no overflow). Our SC protocol is based on the above observation and returns the correct result if there is no overflow during each iteration. The probability of the SC protocol returning the correct result is negligible in practice, and this probabilistic approach leads to significant reduction in computation time compared to the existing solution under similar settings. We have implemented our protocol and the existing protocol in C using gmp, the GNU multiple precision arithmetic library.
Range Queries over Encrypted Data on Cloud
         Students will learn how to efficiently perform range queries when the data stored on a cloud are encrypted with highly secure encryption schemes.
         Using cloud computing, data owners have the opportunity to outsource their data as well as services to the cloud provide, which can provide on-demand access to the data. However, to avoid various privacy concerns or to protect data confidentiality, data owners usually encrypt their data before outsourcing them to the cloud. Since the data is encrypted, this places limitations on the range of operations that can be performed in the cloud. In this project, we implement the task of performing range queries over encrypted data outsourced to a cloud. Range query is one of the most frequently used queries in DBMS. Given the lower and upper bound values of some attribute, say x and y resp., a range query retrieves the set of all records whose corresponding attribute values lie in (x, y). We introduced a secure probabilistic protocol to perform range queries over encrypted data. First, we developed an efficient secure comparison protocol between two encrypted values. The secure comparison protocol serves as the building block for our range query protocol. We have implemented our protocol in C using gmp, the GNU multiple precision arithmetic library.
More coming...