Martin Kleppmann | Kafka Summit London 2019 Keynote | Is Kafka a Database?

Ғылым және технология

Martin Kleppmann is a distributed systems researcher at the University of Cambridge, and author of the acclaimed O’Reilly book “Designing Data-Intensive Applications” (dataintensive.net/). Previously he was a software engineer and entrepreneur, co-founding and selling two startups, and working on large-scale data infrastructure at LinkedIn.
ABOUT CONFLUENT
Confluent, founded by the creators of Apache Kafka®, enables organizations to harness business value of live data. The Confluent Platform manages the barrage of stream data and makes it available throughout an organization. It provides various industries, from retail, logistics and manufacturing, to financial services and online social networking, a scalable, unified, real-time data pipeline that enables applications ranging from large volume data integration to big data analysis with Hadoop to real-time stream processing. To learn more, please visit confluent.io
#kafkasummit #apachekafka #database

Пікірлер: 29

@demokraken4 жыл бұрын
Martin's book is a marvel, highly recommend for people interested in design of distributed applications.
@harshitsinghai13953 жыл бұрын
His book is my first tech book ever. I'm proud to have chosen his book as my first. Totally worth it.
@kevinhock10415 жыл бұрын
Really awesome talk, his book is great too
@el_chivo992 ай бұрын
ok i’ve actually asked myself this very question
@xinyuanliu19593 жыл бұрын
Trying to make some notes here..By replying to the ordering of messages in a Kafka topic partition, we have achieved serializable executing this transaction, because the stream processor for each individual partition is just a single-threaded linear sequential process. We get scalability by being able to do lots of partitions in parallel. Partition by a partition key. Transactions in a database is broken down into multi-stage streaming pipeline in Kakfa. We can get better consistency than many real database.
@hl5768
2 жыл бұрын
it likes using lua in redis
@fb-gu2erАй бұрын
Durability is loosely defined. A durable record doesn’t disappear after you read it at some point
@applerr222 жыл бұрын
For achieving the positive account balance consistency the suggested model is not enough as there is nothing stopping credit event being processed even if debit fails. This can be achieved with additional checks for same event id before performing credit event but this will require a db. Other way could be to generate credit event only after debit succeeds but that will have its own trade offs.
@sandeepkumarverma87544 жыл бұрын
In the case of isolation, what if one consumer picked the message to create user 'Jane' and Kafka rebalanced and delivered the same message to another consumer. Now both the consumers are trying to create user 'Jane' into some database. Now again we have a problem of two 'Jane' users get created.
@anurag8705 жыл бұрын
deja vu :)
@iavasilev4 жыл бұрын
Link to the article from presentation: queue.acm.org/detail.cfm?id=3321612
@rajsaraogi4 жыл бұрын
How about using change data capture and listen to changes of our primary database and then capture them to update others like search indexes or the caching dbs ??
@rajsaraogi
4 жыл бұрын
@@thebeckettgroup yes then which way to take log based architecture or the change Capture ?
@HassanDibani
4 жыл бұрын
@@rajsaraogi CDC is essentially reading the database's log.
@sumitstir3 жыл бұрын
How's the scalability gains from having a partitioned message bus compare with directly partitioning a transactional database like Mysql? Given that we need to support the required write throughput irrespective of if there is kafka in between, what exact advantage is kafka providing here?
@rishabhgpt3
3 жыл бұрын
Distributed transactions !!
@metaocloudstudio22212 жыл бұрын
The all talk make sense, but I have heard the opposite while ago that "Kafka is not a database". So I am confused why not using Kafka as a SoT?
@Rbcksqheclfy2 жыл бұрын
Dear Confluent, So what do you want to achieve here compare to the previous naive example? How can that be compared to a proper distributed transaction? kzread.info/dash/bejne/dKl5mKyvgajFc7w.html In this example, let's imagine some event appending to Kafka was succeeded, index and cache updates were applied, but not to the database, the dead event just did not get applied to the database, the data integrity between index\cache and database is corrupted. The advantage of this approach is to have an event log; I don't see anything about proper distributed transactions and atomicity for non-eventually consistent systems. Please explain.
@MechanicalEI5 жыл бұрын
So... kafta is a database?
@Ayoub-adventures
Ай бұрын
Actually, he didn't project the concept of Durability on Kafka, which for me is what is missing in Kafka to be a database. Conclusion of the talk is that the hard to implement ACID guarantees in traditional databases are made easy using Kafka. But that's not a new idea, since most NoSQL databases use commit log to achieve that
@luzyoz1434 жыл бұрын
More ACID = Better Databases?
@Kingslyt4 жыл бұрын
Great talk. I like the idea illustrated here and not a fan of XA, but wanted to point out a factual inaccuracy in this talk. It is not true that read commited isolation level allows the scenario described at 16:28, which is dirty reads (neither read uncommited nor phantom reads), which is reading what has not been commited yet. Even if one considers the stretched definition of atomicity in this talk and read-commited isolation level together, then there won't be a scenario with relational databases that you would see account1 debitted and account2 not credited.
@asn90436
3 жыл бұрын
I think what he said is write skew not dirty reads
@gstraylz
3 жыл бұрын
Its not about dirty reads. Suppose you are selecting both accouns and you've selected one before commit and second after. Read committed does allow that, although both Oracle and Postgres have a bit stronger guarantees on default level (snapshot isolation), thus for provided example you won't be able to see inconsistent sum over accounts in aforementioned databases.
@Rusebor
3 жыл бұрын
It is a HUGE mistake from Martin. Which makes the whole talk not great at all. His example should have proved that Kafka and a relational database were the same thing. But it proved the opposite. Unfortunately he did not show what would happened should account 12345 had 0 balance. I assume that in that case we should have to emit an event to credit account 54321. But we could’t do this. We separated the original message (transaction) into two independent events. In his example we should have emitted the credit event for 54321 _only after_ we debited 12345 successfully. But even in that case it is not possible to do it in one step. We can’t write to database and Kafka in the same transaction. Kafka is needless here.
@sumitstir
3 жыл бұрын
@@Rusebor yeah, what we really want in this situation is a transactional database with a CDC based approach to update the cache and search index.
@sumitstir
3 жыл бұрын
@@gstraylz It's not the same, in that case if the user refreshes the balance for first account he is guaranteed to have updated value, while same is not true with kafka approach as the 2 events might be published to different partitions, and there is no guarantee for when events in different partitions are processed due to lag.