2 ways to reduce your Power BI dataset size and speed up refresh
Ғылым және технология
Adam shows you two things to reduce your Power BI dataset size. These are both things he commonly sees with Power BI reports and could potentially save you a lot of space and improve refresh times along with report performance.
Data reduction techniques for import modeling: docs.microsoft.com/power-bi/g...
********
LET'S CONNECT!
********
-- / guyinacube
-- / awsaxton
-- / patrickdba
-- / guyinacube
-- / guyinacube
-- guyinacube.com
**Gear**
Check out my Tools page - guyinacube.com/tools/
#PowerBI #DataModeling #PowerBIDataset
Пікірлер: 209
By unchecking the auto-date/time , it simply brought down my data model size by 22MB! I am so happy I tried this. You do a fab job, keep it coming!
I have recently started expanding my knowledge with PBI and your channel has amazing information, examples and tips. I appreciate your work very much! Thank you for your efforts!
Just used this as a step by step to analyze a model that I am working on. I knew I had to get rid of columns (and I had) but now I am pruning even more ruthlessly. And one thing I would add--don't be afraid to remove columns--you can always add what you need back. So what I do is select JUST the columns I know I want, and then use REMOVE OTHER COLUMNS. Then, if I find that there is a column I DO need, I go back to that REMOVE OTHER COLUMNS step, and modify the command by adding the name of the column I need back in. Super easy, super quick.
@GuyInACube
4 жыл бұрын
Love it! It is definitely something folks should be looking at.
Brilliant! As someone used to working with tabular data, I inherently knew removing unwanted columns takes a huge load of schemas. I am a newbie to Power BI and was looking on ways to reduce the model size on my projects and your video just proves how simple it is to cut down the size if you are really *clear* about your data. Thanks for highlighting that part so well, Adam!
Loads of love for this optimization technique. It felt like the PBIX file was suffocated with unrelated columns.
I am a recent subscriber to your channel and must say I love it! Thank you for putting in effort and time and sharing your knowledge.
Because this video, I was able to reduce the size of a Power BI report that includes a customized calendar dimension from 500 MB to 2 MB, by just turning off the Time Intelligence feature. This is so unreal that I got my coworker to reproduce the size reduction. In hind site, it makes so much sense based on how Time Intelligence works. Thank you so much!
Adam, this is an incredible video, thank you. It makes so much sense now that you have explained.
@GuyInACube
4 жыл бұрын
Thanks! 👊
Never really thought columns had an effect, thank you so much for this!
Thanks so much; As always, it has been super helpful! We greatly appreciate you guys giving back to the community this way. Keep up the good work!!
@GuyInACube
4 жыл бұрын
That means a lot Shabnam! Thank you so much 👊
@ShabnamKhan-vk7fj
4 жыл бұрын
@@GuyInACube 👊 anytime!
I completely agree with removing columns that you don't need. But, I think we need to be careful that we don't remove so many columns that we can no longer guarantee uniqueness in the table. When a context transition occurs, the table is iterated and if we have duplicate rows, they will get double (or triple etc.) processed causing very hard to catch (and resolve) bugs.
Great tips.. I always use Remove other Columns to make sure I only keep the columns I need.. always get rid of the columns as a first step and not after you have done bunch of transformations.. plus always, always reduce the date-time to date only if you dont need the time. Time adds lot of bulk to the size (i guess because of high cardinality). hopefully PowerBI team will add Vertipaq analyzer-like tool in performance analyzer
@GuyInACube
5 жыл бұрын
Totally agree! Date-time to date is definitely something we recommend. If you don't need time, get rid of it. If you do need it, split it out.
Really amazing. I have reduce one of my pbix file from 256 Mo to 182 Mo. I also discover a lot of options to optimize my data set. Thank you
@GuyInACube
4 жыл бұрын
That's awesome! 👊
You guys are the bomb! Thanks for the tips. That VertiPaq Analyzer thing? Holy crap! That's a gold mine! It shows all my measures! I've been looking for something like this for forever!
Nice Video. Like that its not just repetitive basics. Its very IRL scenario based of optimization.
Aaaahhhhh where would I be without Guy in a Cube? As always, fantastic info.
My file size was relatively small (c.4 MB) but visual fails to load in PowerBI service. This has helped me to optimise how the table loads and it is now working! This did not reduce my file size significantly (now c.3.5 MB) but that's not the point anyway. Thanks Adam
Excellent video! Just reduced my PBIX file from 136MB to 34MB. Goes to show how little I know about how data is stored. I had several tables with a couple of unique key columns.
@GuyInACube
5 жыл бұрын
WOW! That's amazing. So happy that this helped you out. That's pretty incredible. 👊
Very helpful Adam - thanks for doing the video
Nice Video. Like that its not just basic repetitive basic skills, But real life scenario for optimization.
Good techniques What i normally do is take out as many colums as possible with select colums, and you can always bring them back if at some stage you need a previously removed colum. Just disable data type detection and select data types as your last step
@GuyInACube
5 жыл бұрын
Yup. not a bad approach to pull things in later when you need it. Can you explain more on the data type point?
Very helpful, Adam. Thanks a lot. LOL on 'just like on a cooking show' liked that too!
My file size has reduced a lot. Thank you so much
Great video as always, you guys are great!
@GuyInACube
5 жыл бұрын
Appreciate that 🙏 Thanks for watching 👊
Thanks for this information also i think even if the organization said may we need this column latter on , it is easy to get this column again not a big deal
If only I saw this video on time, like for example last year....awesome video guys
@GuyInACube
4 жыл бұрын
Thanks for watching! 👊
Great video. Thanks a lot. I just learned some new tips.
Amazing! Turning off the date/time configuration you mention in the video reduced my report size by 8MB!
Great optimization tips, thank you
@GuyInACube
4 жыл бұрын
Most welcome! Thanks for watching 👊
Excellent video! I will use countrows from now on and ditch unique ids
@GuyInACube
4 жыл бұрын
Awesome! If you have the time, always be sure to test things as well. Things may work different with your data. Always good to validate.
Appreciate this kind of video.
Someone else already mentioned the datetime fields to watch out for and another one is calculated columns. Great job as always 👍
@GuyInACube
5 жыл бұрын
Yup. sooo many things. We have some other videos coming on data model optimizations. Great call outs though 👊
4 жыл бұрын
Hi Dan, why watch out for calculated columns? Could you clarify?
Hi Adam & Patrick, thank you guys so much for posting awesome contents, as always :) One thing I would like to point out is the shout out for DAX Studio. I have to admit I was a little bit surprised that Darren Gosbell wasn't mentioned as he's the creator and main contributor of DAX Studio. Yes, no doubt that Marco and Alberto (I have huge respect for them) have contributed in some of the coding; Marco has also mentioned a few times that people have mistaken him as its creator and had to clarify that he contributed approx. 5 - 10% of it. So I'm not sure whether that's the case here. Once again, thanks for the awesome contents and keep being awesome!
@ynwtint
5 жыл бұрын
Thanks for bringing this up. I have the same impression that the two DAX gurus from SQL BI are the creator of DAX Studio. Now I the big man behind this very useful tool is Darren Gosbell. (mvp.microsoft.com/en-us/PublicProfile/35889?fullName=Darren%20Gosbell)
@likhui
5 жыл бұрын
@@ynwtint You're welcome. Cheers.
@GuyInACube
5 жыл бұрын
We have a lot of love for Darren! It is a SQLBI tool though and that was the intent. Apologies for giving the impression on actual development time. That wasn't what we were going for.
@likhui
5 жыл бұрын
@@GuyInACube Don't be sorry and totally understood :) I'm looking forward to your next video already. Cheers!
Great video, I just saw the another from Aug 2020 :) about disable Auto Date/Time
Awesome, thank you!
Thanks for what you guys do. Seriously it's so practical and easy to absorb, your channel is very undersubscribed
@GuyInACube
4 жыл бұрын
Much appreciated Alfred! 👊
One thing I always do is ensure I get rid of datetime fields especially if time is very precise. Set as date or if you need the time extra it into another column. a Datetime has high cardinality but a date and time in sperate fields are low. If I don't need to be so precise I could just have the minute of the day or the hour. I also add any custom columns in M/Powerquery rather than Dax as you can get better compression or if I can use a measure.
@GuyInACube
5 жыл бұрын
Totally agree! There are so many things that could be listed here. The video was really long though :| we have more data model optimization videos coming. 👊
Guys you are amazing, keep it up
@GuyInACube
4 жыл бұрын
Thank you so much! Really appreciate that. 👊
Amazing... only removing the Auto date/time reduced a pbix file from 20mb to 2mb. loved it!!!
Ya, I have a model which is taking every time over an hour to refresh. I'm using dataflow as the source and still is taking a long time. So, my next step will be to check if I need all the columns :) Thanks for the video.
Thanks Patrick! Amazing impact that losing a few redundant columns has! 🐱👤🐱👤
Seeing that file size go down from 600MB to 74MB just made a my jaw drop! Thanks for this!
Great video!
First.. Great Video.. Second.. I love how you say to "Jump over to Premium to give you some breathing room" Power BI Premium sits at a price point that only large corporations can afford it. I would love to jump to it for the use of computed tables inside data flows, but cant get it into the budget till next year.
You present very well!!
@GuyInACube
4 жыл бұрын
Appreciate that! 👊
Million thanks
Thanks for the info in this video, impressed with how much this decreases the size of the Powerbi File! Would a similar approach work if you are suffering with the "visual has exceeded the available resources" error in the service when linking to a powerbi dataset?
3:15 Yup I approve that! 😅 I worked with a table of around 12 mil records and not only it took about 2+ hours to fetch on the desktop, ate up all the ram making the PC almost unusable!!!
@GuyInACube
5 жыл бұрын
Very easy to get into that spot. crazy stuff. 👊
This is great!
Perfect
Hello fellow Devs *Please Note* : The Process has changed on how to Load your Model into VertiPaq Analyzer. ✅ Now Export a VPAX file first from Dax Studio, ✅ Then load THAT into the Excel Analyzer. Instruction on first page of new Vertipaq Analyzer ✅
@AndresLopez-yl5qx
2 жыл бұрын
Thank you! - This comment was a life saver!
Good examples. I’ve got a team with an S1 AAS, with ginormous composite transaction key that needs to die. Would save money to get it down to an S0.
@GuyInACube
5 жыл бұрын
Yeah it is amazing what exists in a model.
Must needed techniques!
@GuyInACube
5 жыл бұрын
Agreed :) Thanks for watching Abinash! 👊
VERY Nice video. Love you guys ! s2
@GuyInACube
4 жыл бұрын
Thanks so much!
Oh man... I really would love to show you what were working on. I'm in healthcare data analysis. Healthcare data is legit big and we're doing everything we can think of to reduce our data size. Our latest project PBI file saves at 6GB!
Extremely helpful:)
@GuyInACube
3 жыл бұрын
Appreciate that Togeir! 👊
Thanks Adam. I unchecked Auto date/time and my PBI file dropped from 80MB to 2.4MB !!!!!
This is some goood stuff.!!!
Great ideas presented to reduce the dataset.
@GuyInACube
5 жыл бұрын
Appreciate that! We have some more videos coming on data model optimizations as well. 👊
@chamilam
5 жыл бұрын
@@GuyInACube Super !!! looking forward to those videos.
this actually works
Excel is fun has an outstanding video on the same topic
Wow awesome tips
@GuyInACube
5 жыл бұрын
Glad you liked it! 👊
OMG! Is that a Lone Star State on the Millennium Falcon? LEGIT!
@GuyInACube
4 жыл бұрын
Yes it is! Thanks! 👊
Thank you man! Only the time intelligence reduced my file size from 52MB to 22MB :D
Thanks! I use regularly Vertipaq, but I forgot how important is to delete unused columns ( I reduced close to 100Mb from 170MB). Adam, do you a way to reduce the evaluation process of the table? That is always taking so much time. I optimize my SQL folding so is fast but some table takes ages to evaluate in the refresh.
I initially add just the fields I can filter on (market, customer type etc), together with one fact (e.g. order quantity), then I filter, then I add all the other columns required. The only annoying thing is that once you change a column data type, then you can't add any more from the data tables (at least on import).
Thank you so much for your tutorial. I am using ssas multidimensional live connection. I am trying to create a stacked column chart which show month x axis and value y axis. also shows month wise top selling store. But when i try to top n filter by store that filter shows highest sales per year not filter month wise sales. Please help me how can i solve this.
Hi Adam i am a very big fan of your power bi videos.......... i have a small doubt about how to validate the reports that are developed in Power BI Desktop .....Thanks in advance
Thanks for the video. This message appear when i edit the connection and perform the refresh data of this connection. "We couldn't refresh the connection. Please go to existing connections and verify they connect to the file or server". what should i d?
Great stuff. Got rid of 500MB worth of LocalDate_tables o/. Also found that in one report we have 22 million rows where one column contains numbers but is stored as a String. Wrong on so many levels :) It's not even used in the report! 700MB saved in a few seconds. This will come handy setting up guidelines for building Power BI-reports in our organization. Thanks!
This is really great! Thanks Adam! Would there be any negative impact(s) if we disable the time intelligence for an existing report have datetime columns.
@GuyInACube
4 жыл бұрын
Absolutely not, unless you are using them in the report. We actually recommend disabling time intelligence if you have your own date table.
Hi Adam, very nice video! Thanks a lot, and just wondering what is video recording application that you use to recording your operation on PowerBI? Very appreciate it if you can reply me !
You guys should cover the inforiver visual 🙏
Adam, you're awesome!
@GuyInACube
4 жыл бұрын
Appreciate that Gulherme! Thanks for watching.
Like a cookie show 🙂🙂 Thanks Adam for this great video and tips
@GuyInACube
5 жыл бұрын
Most welcome! 👊
@mikekostuch4891
4 жыл бұрын
That's Cooking show, not cookie show!
@alt-enter237
4 жыл бұрын
@@mikekostuch4891 But cookie shows sounds so much better! :)
Hello Patrick - Thanks. This is a great tutorial on the usage of DAX Studio and VertiPaq analyzer. I have tried using it for my Power BI report which is built based on SAP BW Application server connector. However, I do not see SSAS connection to update the local host and analyze. Could you please help me understand how I can create it? Thanks, PS
very good video. This needs an update though as I cannot follow all the options are not the same any more
Ohhhhhh... Thanks you!!
@GuyInACube
4 жыл бұрын
You are very welcome. Thanks for watching. 👊
Hi Adam, great talk, however what are your toughts about the usal practice of creating huge / heavy / slow multipurpose "golden" datasets which intend to solve the "several sources of truth" problem by putting everything and the kitchen sink in a single dataset file serving dozens of reports?
Do you normally create a backup of the PBI data before you remove columns? Is that just as easy as just creating a PBI file? Just in case we removed columns we should not have done. If so, how do you backup the proper way before we start removing columns, to retain an original the client provided.
Hello. Is it any differences between removing columns or loading not all columns from the file from performance perspective?
Love this. I was wondering, If you have a Surrogate Key and a Business ID which would be high cardinality and you join the tables by Key. Could you actually remove the business keys from the model or should you always leave those in. for example Product Key 1 Product ID 35335 ? I'm thinking in terms of the Fact table AND dimension if you have gone for a STAR schema
@GuyInACube
4 жыл бұрын
Debbie you will need to Surrogate Keys for the relationships, but if you are not using the Business ID I would l definitely remove it from the model. The only time we suggest keeping anytime of ID is if it is needed for reporting. Great point!
Godsend. 👌
Hi Adam love your videos. What do you guys use to zoom in and out on the screen? I also saw it at the MS biz app summit. Thanks!
@denglishbi
5 жыл бұрын
ZoomIt docs.microsoft.com/en-us/sysinternals/downloads/zoomit
@GuyInACube
5 жыл бұрын
I broke my rule in this video. I actually was surprised when i saw it in the editing. Was on auto pilot. I used ZoomIt in the video at one point, but that's honestly the first time - in a long time - I've done that in the videos. Normally all of the zoom and highlight stuff I do in post. But when presenting in person I absolutely use ZoomIt. Every presenter should have it! Or something similar like it.
Hey Adam, when it comes to the Performance Analyzer, what number would be too high for a DAX query, Visual display, or Other? I have DAX queries that range from 100 to 300. I know some optimization could be used but it would be nice to tell my team what to look for as a guide.
i did unchecked the auto date/time my whole database went crazy. luckily i hat a back up. other wise i was not happy. it did safe me 12 mb on a 67mb database.. but i think its not worth the trouble. maybe some thing is not the way it has to be in our database but it was worth trying that aspect.
Good tips haha!
@GuyInACube
4 жыл бұрын
Appreciate that! Thanks for watching 👊
A quick question about data flows and pbi service, would it make sense to load large dimensions to data flows and then only reference it (dataflow) in reports or that approach could cause issues in the long run?
@claytillman2227
5 жыл бұрын
I wonder the same thing. If the dataflow is being refreshed, how can I access that and not refresh in my model. Maybe this is similar to a Direct Query for the dataflow. I don't desire to refresh data, I just want whatever is stored in the dataflow.
after choosing the columns to keep, will the data refresh the same or will it throw error?
Just stumbled across this video - some good tips. Couple of questions though: 1. Unchecking the auto-date/time setting stops me from being able to show a nice hierarchical date slicer (Year->Qtr->Month->Day) - How could I still have one or more of those with the setting disabled? 2. For reducing the number of columns in the dataset, wouldn't it be better to edit the initial source query to only get the columns you need from source? Otherwise, you are telling Power BI to pull in all the columns (and have to handle them all), just to then say "now forget about half the columns I just told you to import"
Does in make any sense to group and summarise the remaining columns after deleting unnecessary IDs? Would it increase performance given that Power BI has very intelligent "packing" abilities?
The most of Power BI devs use all columns from source and them doesn't understand why the project runs slow. The best solution is first PREPARE your data source. In SQL Server, create Queries with only columns that you have to use into Power BI. This process is much more faster than others. And, also, you have at PBI all columns that you will use.
One little thing though - if you disable Auto Date many of your Quick measures won't work any longer as only power bi provided date hierarchies are supported.
Hello, may I know why there is SSAS? Is it the data source of pbix?
i dont see the ssas connection option when using vertipaq analyzer, im using a connection to an sql db with azure active directory for power bi
Hi I need to analyze multiple csv files of each 1mb size. Then how many files can I connect
Hello, here from Chile and I am a fan of your KZread channel. I would like to get the files that you show in the video to be able to practice and follow your steps. I will be grateful if you could share them. Regards!!!
@GuyInACube
5 жыл бұрын
Unfortunately, they are pretty big. It is just the ContsoRetailDW database, modified with extra rows. I also added a custom column for that OrderID column to simulate :) I did that at the SQL level using a view and to also flatten out the data.
It was so greate. could i have source file that show in video?
@GuyInACube
4 жыл бұрын
Unfortuantely no. I used it based on the CotosoRetailDW sample database. But we increased the number of rows. I had 25 million rows in that file. It is pretty big.
hi! Thank you very much for sharing such interesting videos! Could you share the excel file you use in the video and could you explain a little more about the use of DAX Studio? Thanks!
@Gustavo-Santana
5 жыл бұрын
I agree, it would be great if we could have some more details about how to use Dax Studio. Thanks Adam for your great explanation as ever!
@GuyInACube
5 жыл бұрын
The excel spreadsheet is VertiPaq Analyzer - which you can get from sqlbi.com - www.sqlbi.com/tools/vertipaq-analyzer/. Also Marco, from sqlbi.com, has a longer recording on what to do with a slow report. This goes into details on DAX Studio. www.sqlbi.com/tv/my-power-bi-report-is-slow-what-should-i-do-2/. Also, we will be looking at doing more videos on these topics as well.
I also always check the column data type, 'any' should be banned. When possible relationship should be based on text fields. These are my usual tricks.