Distributed Scheduling with Spring Boot: the challenges & pitfalls of implementing a background job
Ғылым және технология
Spring I/O 2024 - 30-31 May, Barcelona
Speaker: Rafael Ponte
Slides: speakerdeck.com/rponte/distri...
Sooner or later a developer will implement his/her first background job using Java and Spring Boot, and what usually is a simple task for the majority of systems might become a nightmare in scenarios that need to deal with high performance, parallelism, distributed systems and a large volume of data. Scenarios like those hide several issues which many developers are not used to, such as large volumes of data, network failures, data inconsistency, out-of-memory errors and even taking the whole system down.
Although it seems controversial, dealing with many of these problems does not require hype technologies or services, but solid distributed systems fundamentals. This talk will present how an experienced developer implements a background job with Java and Spring Boot taking into consideration the main challenges and pitfalls it brings along, and how he/she designs a solution for high-performance, resilience and horizontal scalability at the same time he/she takes advantage of many modules of Spring Boot, Hibernate and the relational database.
If you still believe that a background job is a simple task, so this talk is for you!
Пікірлер: 98
Congratulations on your presentation! You absolutely nailed it. Your thorough research and confident delivery captivated everyone in the room. Your ability to explain complex ideas so clearly is truly impressive. Keep up the fantastic work!
@RafaelPonte
9 күн бұрын
Thanks for the kind words, Eduardo! ❤
Great talk!! so much learnings and addressed real life problems I faced while writing background scheduled jobs... btw we used ShedLock library but this is real good insight.
@RafaelPonte
6 күн бұрын
Thanks! Nice you liked it!! 😊 By the way, ShedLock is a very cool library! 👊🏻
Parabéns, Rafael! Foi um prazer assistir sua apresentação pessoalmente!
@RafaelPonte
23 күн бұрын
Obrigado demais, Rapha! ❤ Você eh top!
I really like the way you explained short running transactions. Nice addition to the jobs! Parabéns pela excelente apresentação! É muito útil!
@RafaelPonte
20 күн бұрын
Thanks so much! I am glad you liked it 🥰
Thanks Rafael! especially for the SKIP_LOCKED feature, new knowledge learnt
@RafaelPonte
20 күн бұрын
Thank you so much! I am glad the talk was helpful for you! 🥰 And yeah, SKIP LOCKED is fantastic!! 💪🏻
Excellent topic! Have some background jobs running here and there and I definitely going to check them again.
@RafaelPonte
20 күн бұрын
Nice! I am glad this talk was helpful to you! 👊🏻
Parabéns, muito show!
Parabéns meu irmão , você deu um show na apresentação, impecável! show de top!
@RafaelPonte
20 күн бұрын
Obrigado, meu irmão!
Beautiful presentation, thank you
@RafaelPonte
20 күн бұрын
Thank you so much! That's very nice you liked it! 🥰
Parabeeens manooo! ficou top! sucesso
@RafaelPonte
17 күн бұрын
Obrigado! Feliz que curtiu ❤
you are an amazing presenter thank you so much learned a lot
@RafaelPonte
14 күн бұрын
Thank you so much!!! I am happy this talk was helpful for you 🥳
Congrats for your amazing presentation, Rafa!
@RafaelPonte
16 күн бұрын
Thanks, Jess! ❤
Amazing! Congrats Rafa!
@RafaelPonte
20 күн бұрын
Thanks, I'm glad you liked it ☺
Congrats Rafael! Parabéns Rafa!
@RafaelPonte
23 күн бұрын
Thanks so much!!! 🥰
O Rafael é fera demais!! Great presentation
@RafaelPonte
20 күн бұрын
Brigadão!! ☺
Great talk! There are a few Java libraries that already solve these challenges (db-scheduler, JobRunr or Quartz). At JobRunr we'd love to share your talk as it explains JobRunr's architecture well and can help our users understand the challenges of distributed scheduling even better!
@RafaelPonte
20 күн бұрын
Thanks for your comment! I'm glad you liked it! ☺ Please, I would appreciate it if you shared it! By the way, I received great feedback from Ronald, the creator of JobRunr-he watched my talk! He is a fantastic guy! ❤
@RonaldDehuysser
19 күн бұрын
@@RafaelPonte You're too kind 🤩!
@marshall143
19 күн бұрын
What is your opinion on nflow Java library? Thank you for video
@RafaelPonte
18 күн бұрын
@@marshall143Thanks for the comment! 😊 I didn't know nFlow, but I understand that if your context allows your team or project to adopt a task scheduler or workflow engine, you should go with it. Usually, those libs and frameworks make the developer's life easier because they address very well all the issues discussed in the talk.
Great job Rafael!
@RafaelPonte
20 күн бұрын
Thank you ☺
mandou bem, parabéns!
@RafaelPonte
19 күн бұрын
Obrigado, Diego ❤️
excellent lecture 💚
@RafaelPonte
20 күн бұрын
Thanks, my friend!
thank you
@RafaelPonte
20 күн бұрын
you're welcome! ☺
Great talk! Did not catch all the red flags in this :)
@RafaelPonte
20 күн бұрын
Thanks! I am glad you liked it!! ❤
Great talk and lot of cool new (for me) information about Spring/JPA semantics! But not much of this is specific to background jobs, and not much in the talk about generic background job processing. So I'd say the title is a bit misleading.
@RafaelPonte
14 күн бұрын
Thanks for the comment 😊 I am glad the content was helpful for you! Out of curiosity, what do you understand as background jobs and job processing, and what do you expect from a talk about these subjects?
Muito bom!
@RafaelPonte
10 күн бұрын
que massa que gostou 😊
Nice!!
@RafaelPonte
20 күн бұрын
Thanks! ❤
Parabéns Rafael, Zerou game do Java.
@RafaelPonte
20 күн бұрын
hahaha, valeu bruno!!!
What a prince 💛🔥
@RafaelPonte
20 күн бұрын
Thanks, Luis ❤
Nice explanation! But did not cover very important case if your app has more than one job marked with @Scheduled annotation. Because it may be crucial moment of performance. May be it will be covered in next topics.
@RafaelPonte
10 күн бұрын
Thanks for the comment 😊 Nice you liked it! I am not sure if I understood what you mean. Usually, a single application has multiple @Scheduled jobs running concurrently doing different things (sometimes at other times). Could you give more details?
@YuliSlabko
10 күн бұрын
@@RafaelPonte If you do not specify in application.yml thread pool size for scheduler explicitly all jobs will be operated by one single thread.
@RafaelPonte
9 күн бұрын
@@YuliSlabko Thanks for the explanation. Now I got your point! ☺ You're right. If your application runs multiple jobs close together or jobs that take too long to finish, tuning the Scheduler's thread pool size is essential. 👊🏻
Parabéns marajá! 😉
@RafaelPonte
14 күн бұрын
Brigadão ☺️☺️
Braaabo de mais. Parabéns, príncipe do oceano kkk 👏👏👏
@RafaelPonte
20 күн бұрын
Brigadão, Junior! 👊🏻
@benicioavila
20 күн бұрын
@@RafaelPonte Parabéns Rafael! Compartilhando com todos do meu time! Abraço.
@RafaelPonte
20 күн бұрын
@@benicioavila obrigado ☺️ E valeu por compartilhar!! ❤️
é o cara! boooraa!
@RafaelPonte
18 күн бұрын
Valeu Mustafa 👊🏻
what's the difference between reading and writing with a rabbit or kafka and reading and writing with a database? Usually i'm using REDIS for solve same problem, because it much faster than usual relation db
@RafaelPonte
9 күн бұрын
Thanks for the comment. I will ignore the trade-offs of having a new component in the infrastructure now and focus only on the developer's perspective. There are differences, but how they can impact your solution depends on your context. I mean, using Kafka or RabbitMQ in the talk's job perspective may have little difference on the job's code, but in the application perspective, which produces events in the queue, we may have to deal with a dual write issue. The same is true for Redis: it depends on how you're using it, such as a distributed lock provider or a message queue.
boa ponte!!!!!
@RafaelPonte
18 күн бұрын
Valeu, Flávio 😊
In my understanding, `select ... limit 50 for update` would directly lock these 50 rows, instead of locking one row and processing one row at a time. But in the video, it seems to be the latter approach. Why is that?
@wukash999
23 күн бұрын
He just presents it like that for a purpose of presentation. Of course it will lock all 50 rows (as long as they meet select criteria and are not locked already). Overall this is a very basic presentation, not sure what was the point of that.
@RaphaelSousa-or1dl
21 күн бұрын
@@wukash999 I think the point is to introduce to more unexperienced people the possibles problems one might encounter, so you can study further on it (at least for me it worked ,since I've never thought or knew about this problems), not to make a thourough implementation guide
@RafaelPonte
20 күн бұрын
Thanks for your comment ☺ As @wukash999 commented, the idea was to make it as didactic and accessible as possible so that junior and inexperienced developers could understand it. Do you think it got confused?
@RafaelPonte
20 күн бұрын
@@wukash999 Thanks for your comment and helping them to understand my intention ☺ Do you think this was an introductory and basic talk? I'm afraid I have to disagree. The talk was designed to simplify the subject and make it accessible for everyone, but it's still a complex, tricky, and detailed theme.
@Transactional Will this works if You have to call a mongoRepositoy and Kafka template ? All or nothing If Kafka call KO The mongo call also ?
@RafaelPonte
20 күн бұрын
Thanks for your comment ☺ Although MongoDB and Kafka support some level of transactions, I don't know how @Transactional annotation would work with MongoRepositories or KafkaTemplates. It's worth reading the Spring Data docs. But it's important to be aware that you do NOT have an atomic operation (all or nothing) when your code mixes different external service calls, like PostgreSQL, Mongo, and Kafka. When you do that, you hit a common issue in distributed systems called "dual write".
@asterixcode
20 күн бұрын
@@RafaelPonte I have the same use case where i need to write to mongo, kafka and also to google cloud storage bucket within the same transaction. Do you by any chance know how to solve this problem so I get a all or nothing? Or if not possible, how we would solve this problem then….
@rabah4306
20 күн бұрын
@@RafaelPonte obrigado :)
@MrKar18
12 күн бұрын
For mongo, you can spin a new session with transaction as well, manually. However for Kafka if the produced records are idempotent, you can use the mongo transaction support above to achieve the same.
Ummm... Distribution topic starts after 27 min. Using db locks is tricky and works differently for different databases, e.g. lock escalation. Better use an app level locking. All that had not really to do a lot with jobs. Just long running tasks in a distributed system.
@RaphaelSousa-or1dl
21 күн бұрын
Do you have a resource recommendation on app level locking? I'm studying the topic and it would be awesome to see it more detailed. Thanks
@RafaelPonte
20 күн бұрын
Thanks for your comment ☺ Distributed systems are tricky, and database locks have worked well for over 30 years. Although some databases might differ, an exclusive row-level lock works similarly. By the way, a few RDBMS suffer from lock escalation, but not PostgreSQL (which was used in the talk's context); in addition to that, we used many approaches in the talk that mitigate the chances of lock escalation 💪🏻 Regarding application-level locking, PostgreSQL offers Advisory Locks as an excellent alternative to row-level locks. They're very light and are handled by the application side.
👏👏👏
@RafaelPonte
20 күн бұрын
thanks!!!
Adorei a conversa, mas não sei se queria falar sobre Spring Boot ou se candidatar a político, hahaha.. brincadeira!
@RafaelPonte
20 күн бұрын
Hahaha, valeu! 😊
Rafa is humble, Freak and beatifiul
@RafaelPonte
20 күн бұрын
Hehe, you're very kind, my friend! ❤
Is he describing Spark 😆?
@RafaelPonte
17 күн бұрын
Thanks for the comment 😊 Do you mean Apache Spark? hehe
Almost made me want to work with boring techs again ;)
@andreas_bergstrom
22 күн бұрын
I’m moving back to Java/JVM after 15 years in Node/JS/Python
@RafaelPonte
20 күн бұрын
Boring techs are amazing! 🙌🏻
Congrats, nice job!
@RafaelPonte
20 күн бұрын
Thanks, Barroso! 👊🏻