MLevel

Design Considerations When Building Cross Platform Applications: Data Architecture

Since I started this blog series a couple months back, the landscape in mobile applications has changed quite a bit, but the design considerations remain constant with regards to data.  In this post, my fourth of the series on Design Considerations When Building Cross Platform Applications, I will focus on the many faces of data transmission and storage with regards to mobile applications.  At SuperConnect, our applications focus on enterprise customers where typically large amounts of data required to power even the simplest of applications. The story about the evolution of our initial data design over a year ago to where our application is today is a long and winding road, but a great learning experience that needs to be shared.

To start, here’s a little background about our first production application, Connections, an enterprise directory that focuses on getting people in touch with their colleagues.  The core features of the app come from the strength of our underlying back-end platform (e.g. server infrastructure, services, databases and security). That being said, much of what makes the system work with mobile applications requires some of the platform’s power to be transferred to mobile devices. A key-differentiating feature of our application relies on the ability to provide the user with nearly the same experience when disconnected as when connected to a Wi-Fi or cellular network.  This feature was a carry-over from a prior version of the application called Cards that was built as a sales tool and side project for our sister-company Slalom Consulting.  In the Cards version of the application, the size of the data set was fairly small when first developed because the contact list consisted of approximately 900 people with 25-30 data fields that needed to reside on the device.  With Connections, our first customer was one where the contact list would be comprised of nearly 40,000 people with about 50 data fields that needed to be pulled form Active Directory on a nightly basis.  Due to the significant increase in data required on the devices, our development team had to evaluate several methods to transfer data to and store data on the mobile devices.

Legacy (“Cards”) Solution

While writing the original mobile directory application Cards, the developers opted to create a mechanism that utilized an HTTP Handler mechanism to transmit data down to the devices.  The data was requested from a web server and the handler would send the required data down to the device as needed in XML format.  Upon initial launch and successful login, the full data set was requested to populate a database on the mobile device using the standard built-in data protocols (e.g. CoreData for iOS and SQLite for Android).  Because this process was performed on a background thread, the application would load with no issue and the user would see the initial contact list populate as the data arrived.  For subsequent launches, the application would only request the data that had been updated since the last time the user had launched the application (and synced).

With Connections, we started development with a similar pattern as described for Cards with a couple exceptions:  to use SQLite natively on all devices and store data in a tall table, rather than normalized.  The intent of using SQLite on all devices was to allow us to reuse some of the same logic in database operations across all of the devices. (NOTE:  This becomes our saving grace later.)   Additionally, we ran several performance tests with small (250) to very large (1,000,000) contacts to determine the structure of our database using tall tables for efficiency.  Very soon after we started to implement the solution, we quickly realized with the amount of data we were storing on the devices that the handler / web service method of synching data to the devices was not going to work.  The initial load of a 40,000 contact data set took nearly thirty minutes to download before the application would actually be usable.

Split Data Strategy

The next approach we decided to take was to split up the sync of the data into several methods.  First, we would request the base data (e.g. First Name, Last Name, Display Name, Office Location, Mobile Phone Number and Email Address) required to get the application up and running quickly enough for the user to access the key information about the contacts. This initial request would occur whenever a user would launch the application — before he/she could use the application to ensure the contact list was in sync with the system-of-record (Active Directory).  Second, we would load the secondary information (Birthday, Hire Date, Client List, etc.) on a background thread while the user was using the application.   While this solution would get the app to the point where users would be able to use the it more quickly than the original approach, it still required quite a few data requests to the server (depending on our paging logic) and a fairly large amount of data being transferred over the wire.  As a result, a new issue reared its head with disrupted syncs. We were able to determine where the application was in the actual retrieval of data, but it allowed for a few scenarios where the data could become out of sync while performing the sync process due to slow or dropped connections during the process.

Server-Generated Databases

Due to the issues with multiple requests to pull all of the data to the devices, we decided to attempt to generate the SQLite databases on the server and push the entire database down to the device as part of a single request from Windows Azure Blob Storage. The database required to perform this task would contain all of the data related to the customer’s contact list as well as all of the data required to configure / customize the application on the user’s devices.  In the case of the larger customer list, we were able to get a compressed database down to the device in about seven (7) minutes which was a significant improvement over the initial approach using service calls, but was still not an acceptable approach for a user being blocked from using the application for this length of time.  Thereafter, we had to iterate through the solution to continue to cut time out of this initial launch of the app.  Some of the questions we asked ourselves were (a) do we really need all of this data on the device, and (b) is there a way to generate parts of the database and merge them when they get to the application?  As a result, we were able to cut our initial database down from 400Mb to under 100Mb (decompressed).  We also realized it would make sense to evaluate the structure of our data and the methods we were using to push the data to the devices.  We took several steps to make this process more efficient:

  1. Split our databases into two:  One DB provides the contact data that would be synched regularly. The other DB, which would probably be updated infrequently, contains any configuration information used to set up the application on the user’s device.
  2. Change our compression / decompression methods:  Initially we used gzip for compression as it is more of a standard in web data transfer, but we found that if we used 7zip we would get better compression.  The difference in our first large customer’s contact DB was about 10x.  The 100Mb DB actually compressed down to 16Mb using 7zip.
  3. Include what we need:  We removed excess data from the databases that were being sent to the device.  In general, customers tend to send more data than the application needs so they have options in how the user interface is configured within the app.  We used a combination of the data fields provided and the configuration setup in our administration console to trim out data that was not actually used within the app.

Back to the Drawing Board

At this point, we had cut down our transmission rate significantly and the footprint on the user’s device had been limited, but problems still remained.  First, we found that the performance of our search on the SQLite DB on the device was awful:  45 seconds to search for “Joe” on a system of 40,000 contacts where about forty results should be returned.  As a result, we had to figure out a method where we could make the search more efficient to make the application usable.  The clear solution would be to implement a full-text search index on the DB, but with it came more problems.  We had to update our services that build the DB to generate the full-text search indices on the server before the DB was compressed, so we don’t have to perform this indexing on each user’s device.  Initially, we attempted to do this indexing on the device, but found that the process drained the device’s battery and would not provide the ideal user experience.  In addition, to help improve the efficiency of querying we upgraded to the most recent version of the SQLite library and include it within our application.  While this does require more upkeep, we are able to control all aspects of our data access in our app.

Overall, the process to get to the production version of Connections has been a great learning experience in how to design mobile applications that require large amounts of data. The transition from doing a part-by-part download of data to downloading an entire database was a big change in our original architecture, but it has led to a very solid application on Android and iOS with a Windows Phone 8 version coming soon.

Stay tuned for my next post on Design Considerations, where I’ll dive into the Design Considerations for security for mobile applications.

Share this post

Share on facebook
Share on google
Share on twitter
Share on linkedin
Share on pinterest
Share on print
Share on email

Leave a Comment