Deduce column headings from rest Query

In Rest to Excel library there are plenty of examples of populating sheets from rest queries. Usually you simply name the columns to match the data you want to extract and you are good to go. However, you may want to automatically detect what data is there and create and appropriate sheet. To make this a little more comprehensive, we are going to allow multi depth column headings (for example parent.child), and also assume that each data item does not necessarily contain data for every column. 

The test data

This was a question on our forum, so we’ll use the rest query that was asked about. In this case its Racehorse data, that starts like this. 

{
    "ResultSet": {
        "dateMAJ": "2013-10-03T14:52:00+02:00",
        "partants": [
            {
                "oeilleres": "A",
                "idCheval": "2004557",
                "jockey": "B. Bourez",
                "nomCheval": "Remember Rose",
                "numParticipation": "1",
                "stats": {
                    "pourcentageReussite": "7%",
                    "nbVictoires": "8",
                    "nbCourses": "109",
                    "nbPlaces": "0",
                    "musique": "Ah7h5h2h2h4hAs4h",
                    "ecart": "10"
                },
                "idJockey": "1021548",
                "poids": "72",
                "sexeAge": "H10",
                "hStats": {
                    "rangLastCourse": "",
                    "valeur": "67",
                    "nomHippoLastCourse": "",
                    "distanceLastCourse": "",
                    "idLastCourse": "0",
                    "variation": "",
                    "dateLastCourse": "",
                    "incident": ""
                },
                "decharge": "-"
            },
            {
                "idCheval": "2077209",
                "jockey": "D. Cottin",
                "nomCheval": "Si Tu Viens",
                "numParticipation": "2",
                "stats": {
                    "pourcentageReussite": "18%",
                    "nbVictoires": "81",
                    "nbCourses": "436",
                    "nbPlaces": "0",
                    "musique": "1h</span>3h0s5s3h0hAs5s",
                    "ecart": "0"
                },

What’s interesting here is that each row doesn’t necessarily contain the same fields, and also that many items have sub-fields. This is easily addressed in the Rest to Excel library by appropriate column naming such as stats.ecart , but we are assuming here that a) we don’t know what fields there are and b) what depth of sub-fields there are. All we know is that we want a two dimensional table of all the data. 

The solution

This will be a 3 step process

  1. Do a query to get the data, but don’t populate the sheet.
  2. Create column headings from what we find, after discovering the full inventory of data in the dataset
  3. Do the query again, allowing Rest to Excel library to populate the data as normal

The results

After creating a library entry that looks like this

With .add("lescourses")
            .add "restType", erSingleQuery
            .add "url", "http://www.geny.com/stats-records-hand-flux-donnees?typeStats=jockey-pmu&amp;type=json&amp;id_course="
            .add "results", "ResultSet.partants"
            .add "treeSearch", True
            .add "ignore", vbNullString
        End With

we end up with this result

The code

Private Sub testCourses()
    Dim cr As cRest, id As String
    id = "556007"
    ' we'll do the query
    Set cr = restQuery("lescourses", "lescourses", id, , , , , False)
    ' now create the headers
    makeHeadingsFromData cr
    cr.teardown
    
    ' we can just do it again
    With restQuery("lescourses", "lescourses", id)
        .teardown
    End With

Creating the known headers

We can just go through each of the properties of each data row and add it to a cJobject that inventories all the headings we’ll need. One of the useful properties of the cJobject is that adding something already there will just replace it. This means we’ll easily capture all known required column headings uniquely. Since we don’t know the depth of the objects, we have to use recursion, but we’ll make that into a flat list of headings, with children separated by underscores (rather than dots – to flatten it out).

Public Sub makeHeadingsFromData(cr As cRest)
    Dim jobHead As cJobject, job As cJobject, joc As cJobject, jod As cJobject
    Set jobHead = New cJobject
    
    With cr
        ' first step, identify the headings
        ' this will also take care of situation when each item doesnt have the same children
        Set jobHead = jobHead.init(Nothing)
        For Each job In .datajObject.children
            Set jobHead = rescurseHeadersFromJob(job, jobHead)
        Next job
        
        ' now we have all possible titles, we can create column headings
        .dset.headingRow.where.ClearContents
        With firstCell(.dset.headingRow.where)
            For Each job In jobHead.children
                .Offset(, job.childIndex - 1).value = Replace(job.key, "___", ".")
            Next job
        End With

        .teardown
    End With
Private Function rescurseHeadersFromJob(job As cJobject, _
            jobHead As cJobject, Optional k As String = vbNullString) As cJobject
    Dim joc As cJobject, s As String

    ' the trick here is to collapse to a single depth- we'll replace the underscores with . later
    If job.hasChildren Then
        If k <> vbNullString Then k = k + "___"
        For Each joc In job.children
            rescurseHeadersFromJob joc, jobHead, k + joc.key
        Next joc
    Else
        If k = vbNullString Then k = job.key
        If (Not IsEmpty(job.value)) Then
            jobHead.add k
        End If
    End If
    
    Set rescurseHeadersFromJob = jobHead
End Function

Populating the data

Once the headings are prepared – we substituted the underscores with ‘.’ when naming the columns – this reinstates that parent/children hierarchy so that the natural process in the Rest to Excel library can assign the data at the appropriate hierarchy level. Then we simply run the query again, this time letting it populate the data with reference to the newly populated columns. 

Some previous snippets of the day

There are various snippets throughout this site. Some of them have been categorized and some not. I’ll use the page for my ‘snippet of the day’ and change them when I remember. You can find other assorted snippets here and here.