PostgresTV 💙💛

21 күн бұрын

Custom vs generic plan | Postgres.FM 096 | #PostgreSQL #Postgres podcast

Пікірлер

@davidcarvalho298522 сағат бұрын

Okay, you guys convinced me. I will try pgbadger. Thanks for this interview by the way. Really nice

@kirkwolak6735Күн бұрын

So, I was wondering... Wouldn't it be nice if there were 2-3 types of plans based on some of the values of the parameters, so you get the most optimum plan and maybe the optimizer does Parameter Peeking to determine which of the X plans to choose... And then I realized. Wow... The application could do this. Create 3 prepared statements for the same query. And execute against the one TUNED for the query parameter types forcing the best plan to be used by design... Hmmm... We have this situation. We have a complicated search. But when the value we are searching for is small (lots of hits) vs large (few hits). It wants to choose the wrong one after a few queries and then a switch. Unfortunately, this is inside of a Procedure where the statement is prepared around us. We would have to basically duplicate the complex query just to make the condition so that it executes the right right way. But I might still try that.

@kirkwolak67358 күн бұрын

Yes, you should test with your extensions. You should have a few general procedures you run that exercise using all of the extensions. And you should monitor log sizes. In case something is going wrong, and it's only in the log files. I like using htop in linux, and watching how much memory the various threads are using and the total. In case memory consumption has changed... This can lead to issues. Reading the documentation for the release. YES, it is good documentation. But it can feel a bit overwhelming because they document so much...

@Marekobi13 күн бұрын

This is gold !! :)

@pdougall115 күн бұрын

Can ya'll talk about the best way to think about adding indexes? What is the problem when adding too many on a table for instance. Or when to reach for one when a query is slow. Confounding factors when there are other queries using the same column (not sure that's relevant). I'm sure there is a lot to consider that are just unknown unknowns for me.

@NikolaySamokhvalov11 күн бұрын

hey Patrick - have you listened to episode "068 Over-indexing"?

@pdougall111 күн бұрын

@@NikolaySamokhvalov I have not, but definitely will. Also looks like there's one on under indexing as well! Might be exactly what I'm looking for, thanks!

@kirkwolak673519 күн бұрын

Michael, thank you for sticking to your guns to get your explanation out there. There is a subtle difference in the AUDIENCE you two seem to be addressing. Nikolay seems to not care about launching a long-running query... Because when he sits down, he likely either knows he has a problem already, OR he's got such deep experience in PG, that he knows to check a few thing before he starts pounding out a query. I believe he implies this when he talks about how he adds the LIMIT based on what he is expecting (eg, when he might be wrong, he will do a LIMIT 2 and let the error guide him). Whereas you were (IMO) driving from a Novice (like me) who *thought* that just adding a LIMIT was *always* a decent safety approach. And my understanding is currently limited to (LIMIT + Order By = Red Flag). Your point goes deeper than that. So, now I realize the correct formula is: (LIMIT + (Order By|Index Range Scan) = Red Flag). Meaning the optimizer might be doing what looks like a simple range scan on some column, but it is orthogonal to the data being found, and can quickly become a semi-seq_scan (find first row with the index, and the seq_scan in reverse until the number of records hit the limit... Which may never happen! Making it scan to the beginning/end). That's two wildly different target audiences. And I could be completely wrong. It's my guess. Of course I look up to both of you, so I apologize if I misstated your positions!

@michristofides18 күн бұрын

Thank you Kirk, for the kind words and the wonderful summary! I think you're spot on, and am glad to hear it was helpful

@pdougall122 күн бұрын

Ya'll are great! Its really important to hear professional db people talking about how all of this works in practice. Beyond a basic explanation that can be found in books (books are also really important btw)

@hamzaaitboutou856326 күн бұрын

more of this please <3

@iury0x5826 күн бұрын

Great content, guys! Binging the channel

@iury0x5826 күн бұрын

Thank you for this content. Very nice

@kaal97028 күн бұрын

I want to understand it...man why no slides

@davidfetter29 күн бұрын

I just love the way this episode captured the processes that actually go into doing the thing! BTW, the repository for the web site is, as far as I know, also a git repository, and I suspect that rebase requests--NEVER use merge--would be easier to get into it than patches sent to the -hackers mailing list for the core code would be.

@keenmate971929 күн бұрын

Looking forward for this one... paging and limits, it's like naming and cache invalidation :-)

@nishic1Ай бұрын

Woww.Excellent video..Very informative..

@nitish5924Ай бұрын

What about massive updates ? We recently had a usecase where we have a postgres database that has 250 million rows and we introduced a new date column, we are facing so many issues in backfilling this column today. it would be great if you could share your insights on how to handle such massive updates

@NikolaySamokhvalovАй бұрын

it's very similar - batching is very much needed additional complexity is index write amplification - all indexes have to be updated (unlikje for DELETEs), unless it's a HOT UPDATE

@kirkwolak673519 күн бұрын

@@NikolaySamokhvalov Excellent point on indexing adding writes. I would certainly add the column. Batch some updates. And only when updates are finished would I consider adding the index on that column. Otherwise it feels like a Footgun!

@awksedgreepАй бұрын

What you need is UUID's across maybe 25 tables with FKs between each, no on delete cascade, and a need to keep the data from all 25 tables elsewhere(archive schema). Getting <200 deletes a second after moving the data off to the archive schema. Had one set of deletes on a large table going for 20 days. Not speaking from experience or anything. haha

@TimUckunАй бұрын

Why doesn’t Postgres have unsigned integers? Also why doesn’t timestamp with Timezone actually store the Timezone of the timestamp? If my app is crossing time zones I really want to know the Timezone of the writer.

@wstrzalkaАй бұрын

Love your podcast. Its fun to listen for both advanced and basic topics as always something new will pop up. And the attached articles list makes waiting for the next episode more bearable 🐘

@chralexNETАй бұрын

One thing about timestamps. In my current project I only use timestamp without time zone, because I always save UTC for all my dates and times. If I do that I guess there are no problems with using timestamp without time zone then? Edit: Okay so reading more of the wiki article they suggest not doing that, because they think it is going to give you issues if you are going to do calculations with other timestamps that have time zones, but I am never going to do that, because all my dates and times will be in UTC.

@kirkwolak6735Ай бұрын

I switched away from that, and one of the reasons was DST.

@chralexNETАй бұрын

@@kirkwolak6735 I don't know why that would cause any issues. UTC is not affected by DST. What a client application does when it receives a timestamp is to work with that timestamp in UTC in its logic, and for displaying the timestamp it explicitly converts it to the local time of the client by adding the time zone offset to the timestamp.

@ilyaportnov181Ай бұрын

comparasion of timestamps is nontrivial topic at all due to timezones :)

@andheriumАй бұрын

Would love it if you guys could make a video on schemas and roles

@dartneer2 ай бұрын

a lot of gold bits here! Thank you guys..👏💪

@PostgresTV2 ай бұрын

thanks! keep watching - and let us know if there are ideas (there is a doc: docs.google.com/document/d/1PNGSn_d0A7gTR4C0p6geQyHOTgdpitKCRvafCNnT-44/edit)

@sanity_equals_nil2 ай бұрын

Would be interesting to listen about how caching is implemented in postgres.

@PostgresTV2 ай бұрын

good topic - included to the list of ideas docs.google.com/document/d/1PNGSn_d0A7gTR4C0p6geQyHOTgdpitKCRvafCNnT-44/edit#

@maudrid2 ай бұрын

You mentioned some pitfalls with trigram. Do you have links to any media that goes into more detail?

@NikolaySamokhvalov2 ай бұрын

Well, the bottom of the docs for pg_trgm already give an idea that things are not easy and require significant effort. a) too low level (cannot be used, for example, with full text search to have fast correction of typos without creation an additional table and a dance around it); b) under heavy loads, for large volumes, GIN fastupdate, pending lists - this will become harder and harder to tune to avoid performance issues (same thing as for any other GIN indexes in general) So these days, I would now perhaps consider some fast LLM for proper typo corrections.

@PostgresTV2 ай бұрын

also, what the bot says about it: postgres.ai/chats/018e9250-abff-73fd-af40-1b06ad17919d // Nik

@obacht72 ай бұрын

Quite philosophical this time. Reminds me of "free as in freedom" vs. "free as in free beer". I will add "you only own what you can destroy" to my list of quotes worth remembering 👍 For me, the podcast is perfect for listening without video. I enjoy every episode!

@obacht72 ай бұрын

I like the analogy of going to the doctor. It makes me think about the health of some people vs. the health of the systems they build 😁

@DavidTuron12 ай бұрын

Thanks for podcast. pgBadger is my favorite tool for many years. Very good tip with setting log_min_duration statement to zero for some sort period. I read about the sampling setting but not try jet. And many thanks for songs❤ They are better than original🙂

@anithag62145 күн бұрын

how to read pgbadger

@DavidTuron12 ай бұрын

Hi, thanks for the podcast and many useful links. I usually use depesz tool, but i will try yours as well:) I have one tip of maybe strange use explain verbose - sometimes i want to have insert with columns definition, so i use EXPLAIN VERBOSE SELECT * FROM table and just copy column names from output:) I now there are other ways to do that - like pg_dump with --column-inserts or use some GUI tool for generating insert o just get columns from some create table def. or \gdesc but with some painful editing:) Thanks again and have nice day and fast queries.

@maudrid2 ай бұрын

I think that there's a flaw in how postgresql treats superuser. Some things only super user can do. Why not allow me to assign permission to a user that I want to allow to refresh a subscription for example. Now this user has to be a superuser. I've had to write functions that are just wrappers do specific things with execution rights of the superuser.

@jianhe51193 ай бұрын

🎉

@patrick99e993 ай бұрын

Just bought his book!

@andatki3 ай бұрын

Thanks so much!

@Drgnslyr1233 ай бұрын

It's easier to make changes as a dev when you don't have to reverse walk the migrations to grab the latest source of truth. What I do is store all functions, tables, triggers, etc in separate files, then I have another file that determines the order and structure. It runs this into something called a shadow DB then we diff it with your live dev DB to generate migrations.

@andatki2 ай бұрын

Is this available as open source code? I am curious about the idea of a shadow DB and whether that makes the DX of schema evolution faster or easier. One practical limitation I've experienced is that making any schema changes requires having the full application environment installed and configured, and this tight coupling feels unnecessary in some scenarios.

@pycontiki3 ай бұрын

Does Andy know Obie?

@pycontiki3 ай бұрын

Rails? Nobody uses that 2007 technology anymore. Django & PostgreSQL is heaven ❤

@NikolaySamokhvalov3 ай бұрын

Well every time you commit code to GitHub or GitLab, guess what is being used;)

@michristofides3 ай бұрын

Haha who should we invite on to discuss Django + PostgreSQL?

@pycontiki3 ай бұрын

@@NikolaySamokhvalovhaven’t both those sites suffered large hacks and database problems? 🤣🙄 I know many people at GH, all good.

@andatki3 ай бұрын

@@michristofides Haki Benita could be a great guest to discuss Django and Rails!

@andatki2 ай бұрын

Interestingly, the Python language is even older than Ruby (1991 vs. 1995) and the Rails framework came out a bit before Django, around 2004 or 2005. Do you know of some things in Django for PostgreSQL that you prefer to what Rails offers? Always curious to see what things look like in other frameworks.

@ivanstanevich3 ай бұрын

Thank you for running this podcast, lots of valuable knowledge in every episode!

@mbanck3 ай бұрын

Current versions of hypopg an also mask an index, for the "would it pick my index if the other one does not exist?" question

@GabrieGliukaz3 ай бұрын

Thank you for the episode. I was wandering if there is documented AWS RDS default configuration somewhere? I failed to find any. In particular I'm interested in autovacuum settings.

@kirkwolak67353 ай бұрын

We noticed some of our most important queries in PG16 had improved performance!

@ilyaportnov1813 ай бұрын

a couple of reasons I think deserve mentioning in this context: * there are different operator families, and they support different operators. For example, default btree index on text field does not support `like` operator; you have to do either `collate "C"`, or text_pattern_ops; on the other hand, text_pattern_ops does not support inequaltiy comparation (</>) and sorting. * there are different collations; if your index is, for example, `collate "C"`, but you do comparasion by equality with default collation, the index will not work - you have to specify collation in the query explicitly or rebuild the index with another collation. * and, talking about selectivity / cardinality, there is a more difficult type of problems, when PG can not correctly calculate cardinality because of several joins: it can calculate cardinality of join result when you join two tables, but then when you join the result of join with the third table, it will probably not be able to calculate cardinality correctly. In this case I don't know a simple way to fix such problem, apart of rewriting one query into several or using materailized views or smth like that. Because of cardinality miscalculation, PG can select totally wrong sequence of joins, and because of that it will not use index... You can try to force the order of joins by use CTEs with `materialized` keyword to make an optimization barrier. Or even switch to Max Boguk's hardcore techniques with recursive CTEs :)

@NikolaySamokhvalov3 ай бұрын

Boguk is beast :)

@agarbanzo3603 ай бұрын

Why doesn’t Postgres do some detection of the disk type and do basic, deterministic self tuning?

@NikolaySamokhvalov3 ай бұрын

Good question. It even has no idea how many CPU cores and GiB of RAM are available. I think there is potential for some tuning module to be developed - and TimescaleDB has it, for example (and many of its things can be applied to non-timescale setups)

@agarbanzo3603 ай бұрын

Didn’t even think of that. Something super simple like worker_mem = parameter * memory available, etc would be a huge improvement

@marcinbadtke3 ай бұрын

Thank you for the conversation. It is hard to imaging for me that disk performance is the main reason database engine chooses to use index or not. As far as I know during sequential scan many database blocks are read in one IO operation. On the other hand random read reads only one database block. In my opinion using index or not is decided based primarily on statistics. Index is not used when cost calculation based on statistics shows that not using index is optimal. E.g. statistics show that amount of data a query tries to get is so big that it is cheaper to do sequential scan.

@PostgresTV3 ай бұрын

Thanks. Good question. I know, it might be counter-intuitive, but it is as it is - I see it quite often (ofc, not for trivial single-row PK lookups). Detailed answer: twitter.com/samokhvalov/status/1761082969001972050 // Nikolay

@marcinbadtke3 ай бұрын

@@PostgresTV thank you

@wstrzalka3 ай бұрын

It's still 4 on RDS. And when raised to their support the answers was it's not related to hardware and I should set it myself to whatever I want :)

@PostgresTV3 ай бұрын

🤷

@Sam-cp6so3 ай бұрын

Because there’s a better index or it’s not done building or you need to run vacuum analyze, so many reasons it won’t use my friggin index

@rosendo32193 ай бұрын

oh yeah, that fat sip of nice and cold beer before you start :)

@NikolaySamokhvalov3 ай бұрын

It was redbull Not into beer really, but I like the idea -- maybe we should do some drinking late night online Postgres event (if Michael is drinking at all, idk)

@burningproblem3 ай бұрын

Great interview and great guest! Thank you both! ❤❤

@pycontiki3 ай бұрын

🤔 where has Nickolay been? Michael has been carrying the show for quite awhile. Hopefully Nickolay wasn’t in jail or anything bad 🤷🏻‍♂️🙄

@NikolaySamokhvalov3 ай бұрын

Jailbreak successfully applied.

@obacht73 ай бұрын

Great insight about json as a data representation towards applications, vs classical normalized storage! Now have fun everybody with the craziest OVER(PARTITION BY) 😊

@varshathkumarterli36943 ай бұрын

I don't understand why this channel doesn't get much attention, this is the most underrated channel ever. This is the content which I'm looking for. Internals of databases in depth. Thanks for the your efforts and keep going 🎉

@PostgresTV3 ай бұрын

thanks; please keep leaving comments, and likes, and share in our social media and groups - this will help us grow!

@bigcountry5033 ай бұрын

I paused the video and learned about window functions

@mmmikram4 ай бұрын

Thanks for explaining so well. I did the experimentation and for me though it is space saving but performance was almost the same as btree index

PostgresTV 💙💛

Full text search | Postgres.FM 098 | #PostgreSQL #Postgres podcast

Minor releases | Postgres.FM 097 | #PostgreSQL #Postgres podcast

Custom vs generic plan | Postgres.FM 096 | #PostgreSQL #Postgres podcast

LIMIT vs performance | Postgres.FM 095 | #PostgreSQL #Postgres podcast

Buffers II (the sequel) | Postgres.FM 094 | #PostgreSQL #Postgres podcast

Massive DELETEs | Postgres.FM 093 | #PostgreSQL #Postgres podcast

Logical replication common issues | Postgres.FM 092 | #PostgreSQL #Postgres podcast

Don't do this | Postgres.FM 091 | #PostgreSQL #Postgres podcast

Search | Postgres.FM 090 | #PostgreSQL #Postgres podcast

Health check | Postgres.FM 089 | #PostgreSQL #Postgres podcast

superuser | Postgres.FM 088 | #PostgreSQL #Postgres podcast

transaction_timeout | Postgres.FM 087 | #PostgreSQL #Postgres podcast

Rails + Postgres | Postgres.FM 086 | #PostgreSQL #Postgres podcast

Why isn't Postgres using my index? | Postgres.FM 085 | #PostgreSQL #Postgres podcast

Overhead of pg_stat_statements and pg_stat_kcache | Postgres.FM 084 | #PostgreSQL #Postgres podcast

Modern SQL | Postgres.FM 083 | #PostgreSQL #Postgres podcast

Bloat | Postgres.FM 082 | #PostgreSQL #Postgres podcast

pgvector | Postgres.FM 081 | #PostgreSQL #Postgres podcast

pgBadger | Postgres.FM 080 | #PostgreSQL #Postgres podcast

EXPLAIN | Postgres.FM 079 | #PostgreSQL #Postgres podcast

Partitioning by ULID | Postgres.FM 077 | #PostgreSQL #Postgres podcast

Hash indexes | Postgres.FM 076 | #PostgreSQL #Postgres podcast

Constraints | Postgres.FM 075 | #PostgreSQL #Postgres podcast

Events | Postgres.FM 074 | #PostgreSQL #Postgres podcast

Subtransactions | Postgres.FM 073 | #PostgreSQL #Postgres podcast

Companion databases | Postgres.FM 072 | #PostgreSQL #Postgres podcast

Blue-green deployments | Postgres.FM 071 | #PostgreSQL #Postgres podcast

Data model trade-offs | Postgres.FM 070 | #PostgreSQL #Postgres podcast

Under-indexing | Postgres.FM 069 | #PostgreSQL #Postgres podcast

Пікірлер