In this post I’ll cover how to use Microsoft Flow to execute a SharePoint search query and save the results as a CSV file. The background to this post was a request to provide an updated list of sites in our SharePoint environment for use as a data source in Power BI. Now I know I could have used PowerShell but not everyone is an admin. This approach can be used for a range of queries and cases e.g. a list of documents that share the same items of metadata, a list of lists that meet specific criteria and it can be adapted for other sources like Yammer. The post demonstrates several key concepts in Flow:

  • Making SharePoint HTTP requests
  • Processing JSON
  • Loops
  • Creating tables
  • Working with arrays, variables and compose
  • Outputting files

I’ll start with the structure of the Flow so you can see how it all comes together and then break it down into the key steps. It is worth noting that the Flow uses SharePoint actions that run using the context of the user running the Flow.

1
The Flow

Triggering the Flow

I have used a manual trigger to initiate the Flow. Manual triggers are great as they allow you to test a Flow without the added complication of a waiting for a trigger action like a Recurrence or forcing it to run a when a new file is added.

Preparation

The first ten actions are all about preparation for the Do Until loop. It is the Do Until loop that gathers the search results and processes them ready for saving. The first step is to perform a HTTP action that calls SharePoint Search.

Get total number of results

2
SharePoint HTTP action

The reason why I perform this action is to get an estimate for the number of results that will be returned in the Do Until loop. If you are familiar with using Search through the user interface then you will have seen two key principles of Search namely result estimation and paging:

3
SharePoint search results

You might have seen that when you perform a search in SharePoint the number of results can vary as you advance through the pages. Typically the estimate gets more accurate as you advance through the pages. The second principle is that you do not get all of the results at once. Each page contains a defined number of results – usually 30 if the person setting up the page accepts the default value. This is no different for the HTTP request. SharePoint will return an estimate and the first 10 results unless you refine the search query. The estimate is on the high side of the final number so we can use that to define the number of blocks we need to return and the number of loops we need to use.

The aim of my search is to return all of the sites in the /teams/ branch in my tenant and for each result the Title and URL for the site. If you are unfamiliar with forming queries then take a look at the Keyword Query Language (KQL) syntax reference and the  SharePoint Search REST API for the details of the GET request. A great tool for trying out queries is the SharePoint Search Query Tool.  The query URI is:

_api/search/query?querytext='(contentclass:sts_site+OR+contentclass:sts_web)+Path:"https:%2f%2ftenantname.sharepoint.com%2fteams%2f*"'&trimduplicates=false&rowlimit=500&selectproperties='Path%2cUrl%2cTitle%2cSPWebUrl'&sourceid='8413cd39-2156-4e00-b54d-11efd9abdb89'&clienttype='Custom'

The query text is:

(contentclass:sts_site+OR+contentclass:sts_web)+Path:"https:%2f%2ftenantname.sharepoint.com%2fteams%2f*"

which translates into “give me results which are Site Collections or Subsites and only those where the URL includes tenantname.sharepoint.com/teams/ “. The unencoded version is:

(contentclass:sts_site OR contentclass:sts_web) Path:"https://tenantname.sharepoint.com/teams/*"

You’ll notice that the URI is encoded. Encoding replaces URL-unsafe characters with escape characters. I’ve also added parameters to the URI to help focus the search and in turn increase the accuracy of the initial estimate:

trimduplicates=false&rowlimit=500

If the trimduplicates parameter is omitted then SharePoint will exclude duplicates by default i.e. apply trimduplicates=true. This will grossly underestimate the number of results for a search designed to return sites as results. It will also produce estimated counts that vary in number as you page through them. The Flow includes a Create HTML table action that you can use to see this effect in action (see later in the post for details). To see the difference set trimduplicates to true in the query contained in the Do Until loop and then inspect the table. As we are after the total number of results to define the number of loops we need perform we can forgo the need to define the number of results per page. However we need an accurate result and this is helped by setting the rowlimit to the maximum number of rows in a result set to 500.

&sourceid='8413cd39-2156-4e00-b54d-11efd9abdb89'&clienttype='Custom'

At the end of the URI is some information that tells SharePoint which result source I would like to use and where the search query is coming from. These items are optional for a simple site search.

I could make the Flow easier to reuse by passing the query URI as a variable into the HTTP action. If you do opt for that then remember to encode the URI either using a encodeURIComponent function or encode it outside of Flow and paste the result in.

Processing the results

The result of the HTTP action is passed back to Flow in the form of a JSON object. The Parse JSON action is used to locate items in the object that I need to reuse elsewhere in the Flow. At this stage I am only concerned with the total number of results which are detailed as TotalRows in the JSON object.

4
Parse JSON action

Producing the JSON schema is probably the hardest part of the Flow. The design process is iterative. The first step is to construct the Flow up to and including the HTTP action. You then run the Flow (hence the Manual trigger) and then inspect the results of the HTTP action by downloading the output.

5
Send HTTP action – obtaining JSON results

The output contains a lot of information that you do not need. However it is too big to simply paste into the schema generator provided with the Parse JSON action as the result is a confusing array of near identical key names. John Liu has produced an excellent blog that describes the trimming process and some of the pitfalls the arise when working with large JSON objects. My preferred approach is to use Visual Studio Code and Nick DeMayo’s JSON Editor extension to cut big chunks out of the JSON object. I then paste a copy of the reduced object into the sample schema generator provided with the Parse JSON action. The JSON Editor allows you to delete chunks without the risk of introducing formatting issues. I do not like the idea of pasting search data from within my tenant into an online schema tool.

6
Visual Studio Code with JSON Editor extension

The resulting schema is:

{

    "type": "object",

    "properties": {

        "odata.metadata": {

            "type": "string"

        },

        "ElapsedTime": {

            "type": "integer"

        },

        "PrimaryQueryResult": {

            "type": "object",

            "properties": {

                "RelevantResults": {

                    "type": "object",

                    "properties": {

                        "TotalRows": {

                            "type": "integer"

                        },

                        "TotalRowsIncludingDuplicates": {

                            "type": "integer"

                        }

                    }

                }

            }

        }

    }

}

Note that the schema excludes parsing the search results as they are not required at this point.

Establishing variables and the loop count

9
Variables used in search query and for calculating the number of loops

Processing the JSON object means that the item values can be easily selected by name and used in subsequent actions.

7
Using TotalRows JSON item in a variable

As mentioned earlier there is a limit of 500 rows for a HTTP search request. The total number of rows from the search result is used to calculate the number of 500 row loops that are required or put another way the number of pages that will be required. A Compose action is used for the calculation:

8
Calculating the number of loops

The expression is:

int(add(div(variables('TotalRows'),variables('RowsPerPage')),1))

I increase the number of pages by 1 to remove the risk from rounding down that can occur when the outcome of the division is finally turned into an integer. Note that mathematical functions do not use the usual symbols. Rather commas are used to separate values and functions used for the operation e.g. add(1,1) is the same as 1+1.

The outcome of the Compose action is passed into the Pages variable so it can be used later. Strictly speaking this might not be necessary but I find working with variables in a Flow is easier to relate to from a logic perspective than a mixture of compose items and variables. Another variable is used to count the number of loops.

10
Variables used to count progress through Do Until loop

The last two actions define arrays that will hold the results from the search queries (SearchResults) and the count of search results per loop (ResultGrid).

11
Arrays used to hold the outcome of each loop and produce a summary table of results

The ResultGrid array will be used by a Create HTML action that will summarise the number of loops and the number of search results discovered in each loop.

Looping through results

Paging

As mentioned earlier there is a limit on the number of search results returned with each query. The aim of the preceding actions was to establish the number of loops required to return all of the results. The Do Until loop will keep running until the number of loops (Page) equals the calculated number of pages (Pages). The original search query has been modified to introduce paging that gathers results in batches of 500.  This is achieved through the use of startrow and rowsperpage parameters in the query URI.

12
Defining the Do Until loop and using parameters to page through results

 

Processing pages of results

A Parse JSON action is then used to identify the results in the JSON object. This action also identifies the total number of results that SharePoint estimates will be returned.

13
Parsing JSON

The search results are returned in a table within the JSON object.

21
Search results table in JSON object

Each row contains 13 properties for a given result.

22
One row per result with the details stored in 13 cells per row
  • Cell [0] Rank  (ValueType: Double)
  • Cell [1] DocId (ValueType: Int64)
  • Cell [2] Path (ValueType: String)
  • Cell [3] Url (ValueType: String)
  • Cell [4] OriginalPath (ValueType: String)
  • Cell [5] Title (ValueType: String)
  • Cell [6] SPWebUrl (ValueType: String)
  • Cell [7] PartitionId (ValueType: GUID)
  • Cell [8] UrlZone (ValueType: Int32)
  • Cell [9] Culture (ValueType: String)
  • Cell [10] ResultTypeId (ValueType: Int32)
  • Cell [11] RenderTemplateId (ValueType: String)
  • Cell [12] piSearchResultId (ValueType: String)

The corresponding schema is:

{

    "type": "object",

    "properties": {

        "odata.metadata": {

            "type": "string"

        },

        "ElapsedTime": {

            "type": "integer"

        },

        "PrimaryQueryResult": {

            "type": "object",

            "properties": {

                "RelevantResults": {

                    "type": "object",

                    "properties": {

                        "RowCount": {

                            "type": "integer"

                        },

                        "Table": {

                            "type": "object",

                            "properties": {

                                "Rows": {

                                    "type": "array",

                                    "items": {

                                        "type": "object",

                                        "properties": {

                                            "Cells": {

                                                "type": "array",

                                                "items": {

                                                    "type": "object",

                                                    "properties": {

                                                        "Key": {

                                                            "type": "string"

                                                        }

                                                    },

                                                    "required": [

                                                        "Key"

                                                    ]

                                                }

                                            }

                                        },

                                        "required": [

                                            "Cells"

                                        ]

                                    }

                                }

                            }

                        },

                        "TotalRows": {

                            "type": "integer"

                        },

                        "TotalRowsIncludingDuplicates": {

                            "type": "integer"

                        }

                    }

                }

            }

        },

        "Properties": {

            "type": "array",

            "items": {

                "type": "object",

                "properties": {

                    "Key": {

                        "type": "string"

                    }

                },

                "required": [

                    "Key"

                ]

            }

        }

    }

}

You’ll notice that whilst items like Rank are reported as floating point numbers in the results I have defined each cell as a string (rather than having 13 separate entries). I find it easier to use a minimalist schema and convert items as needed to the correct type after they have been gathered.

14
Using a Select action to obtain the Path and Title for each row item

A Select action is used to extract the Path and Title values from each result row using an Item function and the corresponding cell reference i.e. [2] for the Path and [5] for the Title.

 

item()['Cells'][2]['Value']

The Path and Title labels are used as column headings and I have chosen to manually define them i.e. I typed them in. A function can be used to read these values from the first row of results e.g.

body('Parse_JSON')?['PrimaryQueryResult']?['RelevantResults']?['Table']?['Rows'][0]['Cells'][2]['Key']

This function would return the key name of ‘Path’ from the first row (rows start from zero). The function also demonstrates an alternative method for obtaining single values and navigation through JSON hierarchy.

15
Combining the results with previous values using Union

Each block of results is then appended to the results held in the SearchResults array from previous loops using a Union function:

 

union(variables('SearchResults'),body('Select'))

The output of the union is then set as the new value of the SearchResults array. For this I have used a Set variable action as an Append to array variable action only seems to work when a single item rather than an array is added.

Building a summary table of results

16
Appending the loop number and total rows to the ResultGrid array

The value of the particular loop iteration and the number of results returned by the query are single items and so I can append them to the ResultGrid array using a Append to array variable action. When working with arrays it is important to include the space immediately after the colon as part of the {Key: Value} definition.

{
"Loop": "@{variables('Page')}",
"Count": "@{body('Parse_JSON')?['PrimaryQueryResult']?['RelevantResults']?['TotalRows']}"
}

Incrementing the counters

17
Incrementing  counter variables

The final actions in the loop increment the StartRow variable by 500 and the Page variable by 1. The Do Until loop will exit when the criteria of Page equals Pages is met.

Output

Summary results table

18
Creating an HTML table of results

A Create HTML table action is used to present a summary of the results returned per loop. The table can be inspected once the Flow has completed running. The headings from the table are those as defined in the array.

results 1
Example results table – can you spot where the estimate varies…

As mentioned near the beginning of the post, the table can be used to view the contribution of parameters in a search query. For example, using the default value of  trimduplicates=true creates a subset of results which is significantly smaller than the total of 23,478. Note how in both instances the results settle after a number of loops.

results 2
Impact of setting trimduplicates=true

Search results as a CSV

19
Outputting the results as a CSV file

A Create CSV table action is used to parse the SearchResults array into a comma separated format. The resulting output is saved to my OneDrive using a Create file action.

20
Sample output

In terms of speed, as the Flow is using SharePoint Search, the process is quick as it is not trying to iterate through sites one-by-one. Typically it takes a little over 4 minutes to work through 23,000 items and 47 loops.

 

There you have it. A Flow that performs a SharePoint Search and saves the results as a CSV file.

 

 

 

 

 

 

 

 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s