Using Kotlin kscript for Preprocessing Data

6 min readApr 6, 2018

DroidCon Boston 2018’s backend this year was Firebase Realtime DB so it could be easily updated and shared with the iOS and Android conference apps. Unfortunately, not all the data was. As is the case with most projects, all the bits needed were in different formats. The main web site was done using SlidesUp (https://slidesup.com/). The volunteer list was done using a Google Slides spreadsheet. The FAQ from last year was stored in a Sqlite DB.

To those who haven’t used Firebase RTDB before, it’s basically a nested hashmap; there are no arrays, so they’re done by defining key/value collections. For added complexity, the client APIs for Firebase RTDB is fairly crude; you can’t do complex compound queries or even do inner joins easily (inner joins are done by using nested queries). Firebase Firestore DB addressed the complex compound query API shortcoming, but it was still in Beta when development had to be done for the conference, so we couldn’t risk using it.

The short of it is, we needed the data preprocessed into a Firebase RTDB schema that matched closely to what we needed in the app. The other data also needed to be imported into Firebase. I took this as an opportunity to try using Kotlin kscript (https://github.com/holgerbrandl/kscript) to see if it was up to the task.

For input data formats, we had:
- CSV data exported from Sqlite for the FAQ
- CSV data exported from Google Slides for the Volunteers list
- JSON from SlidesUp’s Firebase export, but it was normalized

To simplify the queries on the client, I wrote scripts to denormalize the data so it could be easily consumed by the apps.

First up was the schedule data exported from SlidesUp’s database. It looked something like (some fields removed to shorten snippet):

{
  "events" : {
    "-L4b_dz4i6Wh94qh1kkr": {
      "description" : "<HTML text>",
      "duration" : "PT45M",
      "name" : "<title>",
      "roomIds": {
        "-L5_PsZvWSAv4Lrf-Yo9" : true
      },
      "speakerIds" : {
        "-L4NL8eGr2K_XjDFHiTJ" : true
      },
      "startTime" : "2018-03-26T15:00:00.000Z",
      "trackId" : "-L5___NDDDtAjf0dEO7W"
  },
  "tracks" : {
    "-L4bbBGSqk7ZVv5lfKNV" : {
      "description" : "<p><br></p>",
      "name" : "Workshops",
      "sortOrder" : 2
    },
    ...
  },
  "speakers" : {
    "-L4CAY7WXpAXJm0PYTwH" : {
      "bio" : "<p>Lora Kulm is a designer turned developer who has been working with Android for a little over 5 years. She is mostly self-taught with a supplementary Comp Sci 101 class in college, and has worked everywhere from a children’s education app startup to city government. She currently works as an Android developer and is passionate about graphics, animations, and her two dogs.</p>",
      "name" : "Lora Kulm",
      "org" : "Phunware",
      "pictureUrl" : "https://firebasestorage.googleapis.com/v0/b/slidesup-8b9d6.appspot.com/o/confs%2Fdetail%2Fdroidcon-boston-2018%2Fspeakers%2Ff32b0416-59b3-451b-8bd2-879cd8cce082.png?alt=media&token=bb202012-8824-427e-a13f-967fa3f5dbd8",
      "socialProfiles" : {
        "github" : "senojl",
        "linkedIn" : "lora-kulm-a4614569",
        "twitter" : "loraj_k"
      },
      "title" : "Software Engineer"
    },
    ...
  },
  "rooms" : {
    "-L5_PsZvWSAv4Lrf-Yo9" : {
      "name" : "Virginia Wimberly Theatre",
      "updatedAt" : 1518898744018,
      "updatedBy" : "y556KVJSEqPQ4s1isGkLnEc9HlD3"
    },
    ...
  }
}

Comparing this to what we need for the main agenda screen:

As you can see, the first issue was the normalized events didn’t include the room names or speaker names. The ID is used instead. Ditto speaker names. The Firebase solution for this is to use a query inside a query (see https://firebase.googleblog.com/2013/10/queries-part-1-common-sql-queries.html if you’re curious, but it’s very tedious to do nested queries in Java or Kotlin which is why the examples are in Javascript :-).

Conceptually, it makes sense, but it makes the app code more complicated than it needed to be, and it compounds because it has to be done on multiple platforms. But we can just preprocess the JSON before importing into Firebase and denormalize everything to make it simpler for the apps to process!

First we need a data model for the SlidesUp export, which is as you’d expect from this nested HashMap structure:

data class ConferenceDataModel(
        val events: Map<String, EventModel>,
        val rooms: Map<String, RoomModel>,
        val speakers: Map<String, SpeakerModel>,
        val tracks: Map<String, TrackModel>
)
data class EventModel(
        val name: String,
        val description: String,
        val duration: Duration,
        val startTime: Date,
        var endTime: Date?,
        val roomIds: Map<String, Boolean>?,
        val speakerIds: Map<String, Boolean>?,
        val trackId: String?,
        var roomNames: Map<String, Boolean>?,
        var primarySpeakerName: String?,
        var speakerNames: Map<String, Boolean>?,
        var speakerNameToPhotoUrl: Map<String, String>?,
        var speakerNameToOrg: Map<String, String>?,
        var trackName: String?,
        var trackSortOrder: Int?
)
...

We just need to enhance some of Slideups’ Firebase data so we can import it into our own Firebase RTDB. The “val” fields are the fields that come directly from SlidesUp. The “var” fields are the ones we’ll be adding. The processing looks like this:

confData?.events?.forEach {
    // denormalize speakers
    val speakerNames = HashMap<String, Boolean>()
    val speakerNameToPhotoUrl = HashMap<String, String>()
    val speakerNameToOrg = HashMap<String, String>()
    it.value.speakerIds?.forEach {
        confData.speakers.get(it.key)?.let {
            speakerNames.put(it.name, true)
            if (it.pictureUrl != null) {
                speakerNameToPhotoUrl.put(it.name, it.pictureUrl)
            }
            if (it.org != null) {
                speakerNameToOrg.put(it.name, it.org)
            }
        }
    }
    if (speakerNames.size > 0) {
        it.value.speakerNames = speakerNames
        it.value.speakerNameToPhotoUrl = speakerNameToPhotoUrl
        it.value.speakerNameToOrg = speakerNameToOrg
...
    }
    // denormalize rooms
    val roomNames = HashMap<String, Boolean>()
    it.value.roomIds?.forEach {
        confData.rooms.get(it.key)?.let {
            roomNames.put(it.name, true)
        }
    }
    if (roomNames.size > 0) {
        it.value.roomNames = roomNames
    }
    // look up track name
    it.value.trackId?.apply {
        val trackInfo = confData.tracks.get(it.value.trackId!!)
        trackInfo?.let { track ->
            it.value.trackName = track.name
            it.value.trackSortOrder = track.sortOrder
        }
    }
    // calculate the end time
    it.value.endTime = Date.from(it.value.startTime.toInstant().plus(it.value.duration))
}

Besides denormalizing the ID fields, we also had to calculate the end time (Firebase’s API also doesn’t support deserializing ISO8601 datetime fields).

Next up is the CSV data. Luckily, there are CSV to JSON converters online like https://www.csvjson.com/csv2json. From this, we just get json like this:

[
  {
    "HS": "X",
    "First Name": "Stephen",
    "Last Name": "Vance",
    "Position": "CTO / Innovation Lead (advising)",
    "Twitter": "StephenRVance",
    "PhotoUrl": "http://www.droidcon-boston.com/wp-content/uploads/2018/02/stephen-vance-520x324.jpg"
  },
...

Unfortunately, if you remember, Firebase is only a nested Hashmap (aka a key/value store), so you can’t have arrays. Arrays have to be transposed into hashmaps. Because we’re converting between different data types instead of just enhancing a data model, we’ll need different data models:

data class CsvVolunteerModel(
        @Json(name = "First Name")
        val firstName: String,
        @Json(name = "Last Name")
        val lastName: String,
        @Json(name = "Position")
        val position: String,
        @Json(name = "Email")
        val email: String,
        @Json(name = "Twitter")
        val twitter: String,
        @Json(name = "PhotoUrl")
        val photoUrl: String?
)data class VolunteerModel(
        val firstName: String,
        val lastName: String,
        val position: String,
        val email: String,
        val twitter: String?,
        val pictureUrl: String?
) {
    constructor(csvModel: CsvVolunteerModel): this(
            csvModel.firstName,
            csvModel.lastName,
            csvModel.position,
            csvModel.email,
            if ("".equals(csvModel.twitter) || "---".equals(csvModel.twitter)) null else csvModel.twitter,
            csvModel.photoUrl
    )
}

Using a constructor lets us generate the model we want to put into our Firebase DB. Processing this is simple because we can just use a numerical index for the key:

fun processData(data: List<CsvVolunteerModel>?): Map<Integer, VolunteerModel> {
    val volunteerModels = HashMap<Integer, VolunteerModel>()
    var volunteerIndex = 0
    data?.forEach {
        val volunteerModel = VolunteerModel(it)
        volunteerModels.put(Integer(volunteerIndex++), volunteerModel)
    }    return volunteerModels
}

One other useful tip is to write the processing methods/classes in separate files and write unit tests for them in Android Studio. Then you can easily test the import/export code. Your final kscript will just be a bunch of includes and code to reading a file into your data models:

#!/usr/bin/env kscript
//DEPS com.squareup.moshi:moshi:1.5.0,com.squareup.moshi:moshi-adapters:1.5.0,com.squareup.moshi:moshi-kotlin:1.5.0
//INCLUDE VolunteerDataModels.kt
//INCLUDE VolunteerDataUtils.ktimport okio.Okio
import java.io.FileInputStream
import com.squareup.moshi.Typesif (args.size != 1) {
    System.err.println("Usage: processVolunteers <jsonfile>")
    kotlin.system.exitProcess(-1)
}
val inputStream = FileInputStream(args.get(0))val csvJsonAdapter = VolunteerDataUtils.getCsvVolunteerAdapter()
val csvData = csvJsonAdapter.fromJson(Okio.buffer(Okio.source(inputStream)))val data = VolunteerDataUtils.processData(csvData)val jsonAdapter = VolunteerDataUtils.getVolunteerAdapter()
val fixedJson = jsonAdapter.toJson(data)
println(fixedJson)

Overall, kscript did it’s job and made writing typesafe scripts easily and you can use Android Studio to help you debug your script instead of using print statements in a normal scripting language.

I ran into two bugs while using kscript because it’s still in its infancy. One was that the INCLUDE of your files had to include the full path but that was fixed. The other is that the “@file” notation doesn’t work yet, but I’ve reported it: https://github.com/holgerbrandl/kscript/issues/107

Code for preprocessing the Droidcon data is in this branch if you need to do anything similar or want to see how it works: https://github.com/Droidcon-Boston/conference-app-android/blob/feature/ken/denormalizeSlidesUp/Droidcon-Boston/app/src/test/java/com/mentalmachines/droidcon_boston/preprocess/processVolunteers

kscript does hold promise for being a useful scripting tool if you want to stick with doing everything in Kotlin :-)

Using Kotlin kscript for Preprocessing Data

Written by Ken Yee

No responses yet