Update 21/05/19

This bug has been squashed in:
* Power BI Desktop Version: 2.69.5467.1751 64-bit (May 2019)
* Excel for Office 365 MSO (16.0.11601.20174)

The focus of this post is an optimization tip for Excel sheets that are used in Power BI queries. The tip, which is to avoid using formatting beyond the range you actually need, is a result of a potential bug in the way Power BI processes empty but filled cells. I discovered the bug whilst trying to solve a problem as to why a 512 row by 19 column sheet would never load in Power BI. There was something about the sheet that caused Power BI to consume every available resource when loading the sheet. It took a while to discover the root cause. I tried several things including only accessing the specific columns I needed in the query, trying a different machine and saving the file locally. Nothing would stop Power BI consuming every available resource when loading the sheet. Then I noticed that several columns and rows had a background fill which had been copied down the full height and across the width of the sheet. Removing the fill allowed the sheet to load in seconds. I suspect that what I have discovered is a bug but it also serves to illustrate a way by which you can optimize sheets for processing.

I’ll demonstrate the phenomenon in the reminder of this post.

The sheet

In order to illustrate my point I’ve created a simple spreadsheet. In Cell G2 I have added a drop down selection using Data Validation.

A single column sheet. Column G contains an in-cell drop down with 4 options.
Simple sheet – In-cell drop down in Column G

The source for the drop down is in cells K2 to K5. The first item in the list is blank so ‘ignore blanks’ is not selected in Data Validation.

Shows Excel option to create an in-cell drop down using Data Validation
Using Data Validation to create a drop down choice

The query

For the demonstration I am using a typical simple Power BI query to access a sheet saved locally. The phenomenon occurs regardless where the file is saved.

The image shows an out of the box query to get data from Excel. It is generated using Get data > Excel
The query – Get data > Excel

Without background fill

First up I use autofill to fill down to the end of Column G. I’ve chosen to fill to the end of the sheet as this mimics common user behavior and the situation in the problematic sheet. The fill starts at cell G2 and the cell contains the drop down but no background fill.

The images shows to the steps to autofill a column in Excel. Select the entire column > Select autofill down from the ribbon
Using autofill to fill the entire column

The Fill Down option fills the column down to row 1048576 which is the last row in the sheet. Loading the sheet into Power BI yields 11 columns with headings and 4 rows of data.

The query returns 44 items of data.
Power BI Query Editor showing region returned by the query

You might be wondering why the region is this size. If you review the sheet again you’ll notice that there is data in cells A1 and K5. As I am using a simple query it imports a rectangular region which is defined by cells that contain data i.e. top left corner, the lowest row and furthest column to the right. It does not return data to row 1048576 as whilst it contains data validation the cells appear empty. The data refreshes in a blink of an eye.

The image is of the refresh dialogue. It loads the 15kb or so of data in under a second
Blink and you’ll miss the refresh

With background fill

Now repeating the exercise but this time with a background fill to the cells yields a completely different result:

Query editor preview of 11 columns and 999+ rows
Power BI Query Editor showing a region of 11 columns and 999+ rows

Power BI reports that the cells below row 4 contain null values but yet the region extends to row 999+. The data refresh takes much longer:

The image is of the refresh dialogue. It loads the 4.56MB or so of data in around 7 seconds.
The refresh returns 4.56MB of data!

To my mind this is a bug. The Excel file is only 9kb in size and, whilst the cells contain a validation formula and have a defined fill, Excel considers them as empty. Clearly this can have a significant impact on the ability of Power BI to import and refresh data as it changes the definition of the area to be imported to 11 columns with headings and 1,048,575 rows of data i.e. 11,534,336‬ items to be processed!

Returning to the problem I was trying to solve. The reason why the original 512 row by 19 column sheet never loaded in Power BI was that several columns had been filled with colour to the bottom of the sheet (row 1,048,576) and 2 rows had been filled to the extreme right (column XFD) (I suspect by selecting the row and then filling it which is another common user action). This defined a region of 16,384 columns and 1,048,576 rows i.e. ‭17,179,869,184‬ items to be processed. That yielded in excess of 2GB of data to process with the result that the refresh failed to conclude.

I know the bug can be mitigated by defining the region as a list or with additional steps in the query but that should not be necessary if the cells are null. I suspect that it also serves to highlight that Power BI considers any formatting when processing sheets and so query performance can be optimized by removing formatting.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s