No video

PyPDF2 Crash Course - Working with PDFs in Python [2023]

In this tutorial we will explore how to use PyPDF2 to read PDFs, extract text from PDFs, split PDFs , merge PDFs and more
⚡ PyPDF2 Crash Course ⚡ : Working with PDFs in Python
💻 Code:github.com/Jch...
📝 Blog:blog.jchariste...
📺 Become a Patron: / jcharistech
🎓=== Check out these Awesome Data Science Courses!===🎓
🧑🏻‍🔧 Building ML Web Apps:www.udemy.com/...
🧑🏻‍🎓 Learn Streamlit: www.udemy.com/...
🧑‍🎓BioInformatics in Python:www.udemy.com/...
🤵🏻 Go4DataScience & Go For NLP(Udemy): www.udemy.com/...
🧑🏻‍🔧 Machine Learning in Python:www.udemy.com/...
🧑🏻‍🎓 DVC and Git For Data Science:www.udemy.com/...
If you liked the video don't forget to leave a like 👍 or subscribe ❤️.
⚡ If you need any help just message me in the comments, you never know it might help someone else too. ⚡
⏲️===TimeStamps===⏲️
0:01 Introduction & Demo
01:30 Setup and Installing Packages
02:00 PdfReader vs PdfFileReader
02:50 Workflow of PyPDF2
03:40 How to Read a PDF File In Python
04:40 Metadata of PDF
06:20 How to get number of pages
07:10 How to Extract Text From PDF
08:50 Get PDF Metadata
10:15 Extract Text From PDF
14:46 How to Split PDFs
15:40 Split PDF Function
22:50 PdfWriter Position
24:10 How to Split PDF upto A Specific Page
33:20 Get Last Page of PDF
38:16 Merging Multiple PDFs
39:00 How to Fetch All PDFs in A Directory
41:35 How to Merge PDFs
45:20 Rotating a PDF Page
51:30 Recap
JCharisTech
Support the Channel: Become a Patreon
📺 Become a Patron: / jcharistech
◾◾◾Get The Data Science Prime App◾◾◾
@ Playstore : bit.ly/2LArYQu
◾◾◾ Need Your Dataset Cleaned check out this gig ◾◾◾
www.fiverr.com...
Follow
💻 / jcharistech
🌎 Website: jcharistech.com
📂 GitHub: github.com/Jch...
📱 Twitter: / jcharistech
📝 Blog: blog.jchariste...
📺 Patreon: / jcharistech
🌐 WP: jcharistech.wo...
🏫 Course: jcharistech-ins...

Пікірлер: 45

  • @ushasingh7752
    @ushasingh7752 Жыл бұрын

    best tutorial i have ever taken with lots of exercises and detailed explaination. Thank you so much💗💗💗♥

  • @JCharisTech

    @JCharisTech

    Жыл бұрын

    Glad it was helpful! Singh

  • @asheeshmathur
    @asheeshmathur7 ай бұрын

    Very good tutorial, how to read a Bularian PDF, and read specific section to extract data. Any pointers will b helpful

  • @pariabr4027
    @pariabr40277 ай бұрын

    This topic was tough for me, but you explained it really well!

  • @JCharisTech

    @JCharisTech

    7 ай бұрын

    Glad it was helpful

  • @imthebearimthebear3316
    @imthebearimthebear3316 Жыл бұрын

    excellent variable names and clean display really helps with the example

  • @JCharisTech

    @JCharisTech

    Жыл бұрын

    Glad it was helpful

  • @MateFast_Oficial
    @MateFast_Oficial7 ай бұрын

    ¿Amigo, sabes como extraer comentario que tienen imagen o voz de un pdf? Gracias por adelantado.

  • @gamerk88
    @gamerk88Ай бұрын

    Finally I have found a good tutorial

  • @rehanadgrt
    @rehanadgrtАй бұрын

    Could u pls explain? how to compare two pdf and if it is not identical ,extract the extra parts.

  • @IThinkItsMe
    @IThinkItsMeАй бұрын

    This is a high quality tutorial 👌

  • @harshit_singh19
    @harshit_singh19 Жыл бұрын

    @JCharis Tech how to read number of sections in PDF files ?

  • @HRTG1234
    @HRTG1234 Жыл бұрын

    Great tutorial! Thank you so much!

  • @JCharisTech

    @JCharisTech

    Жыл бұрын

    Glad it was helpful!

  • @mangalwedeshrinivas7249
    @mangalwedeshrinivas7249 Жыл бұрын

    At 29:10, I think the two statements, filename = os.path.splitext(...) and output_filename = f"... can be taken out of the loop.

  • @hnahler

    @hnahler

    4 ай бұрын

    Definitely! They should be outside. The way it is written, the file will be saved for each page but then will be overwritten at the next iteration. Also, the range function in the for loop should add 1 so that the function uses 0 and 1 when the user wants to output the first two pages.

  • @Jon-bk2bw
    @Jon-bk2bw Жыл бұрын

    Really thorough and updated methods, thank you!

  • @JCharisTech

    @JCharisTech

    Жыл бұрын

    Glad it was useful!

  • @francescovecchio3931
    @francescovecchio3931 Жыл бұрын

    I have a simple problem; I have to read a pdf and change some words of text and then save to a new pdf that keeps thesame layoutc of originale pdf. I don't need anything else, but I can't find working examples on the web! Can you help me, Thanks, Francesco

  • @shreenaths6598
    @shreenaths6598 Жыл бұрын

    Hi awesome video. I have one question, where i want create replica of pdf through automation python script and save it on cloud. Can u suggest for same? Advance Thanks

  • @niv8880
    @niv8880 Жыл бұрын

    I really really need to know how to flatten a pdf. Ghostscript doesn't work, Magick merges all my pages onto the first. I can't use pdftk on a Silicon Mac, roads all lead to nowhere!

  • @adan8657
    @adan865710 ай бұрын

    How can I get the underlined text?

  • @user-sv3bk3jf5j
    @user-sv3bk3jf5j Жыл бұрын

    HI, you Video was very Helpful !! Please Please create Video to code to put an image and / or a user defined text as a watermark into a pdf !!!????!!!!! if not video then please share the code ?? the other videos use the old libraries and not the new ones...

  • @wasima4463
    @wasima4463 Жыл бұрын

    you did not cover the real scenarios, when extracted text from research paper pdfs contain weird fonts, non homogeneous spacing, newlines and sometime letters overlap on each other. Can pypdf2 deal with that?

  • @mrinkahok1522
    @mrinkahok1522 Жыл бұрын

    Hello, I almost succeeded in this. I have one more problem. I want to add more pages to the pdf file. But it overwrites the previous page. I want to add it but I can't.

  • @drunkpy1590
    @drunkpy1590 Жыл бұрын

    How do you extract data from PDF to text then systematically show the extracted text on excel?

  • @mrinkahok1522
    @mrinkahok1522 Жыл бұрын

    I have a file with 15 pages and I want to write page 2 through page 15 to another file because I no longer need page 1. So I want to throw away page 1.

  • @NazeerAhmad-bk1ul
    @NazeerAhmad-bk1ul Жыл бұрын

    Thanks

  • @LeandroAuzier
    @LeandroAuzier Жыл бұрын

    do you know how to convert XML in PDF? i was looking for pyxml2pdf but i kinda don't get it at all, i don't know if its a stopped project or i get it wrong

  • @bc4198
    @bc4198 Жыл бұрын

    Awesome!

  • @JCharisTech

    @JCharisTech

    Жыл бұрын

    Glad you think so!

  • @andrewmarty6001
    @andrewmarty6001 Жыл бұрын

    Fantastic Turtorial

  • @JCharisTech

    @JCharisTech

    Жыл бұрын

    Glad it was helpful

  • @cheonglily3992
    @cheonglily3992 Жыл бұрын

    very good video. I have a question if I have a pdf file , i only want 3 things from the pdf. airway bill no, total amount and if it's goods coming in for eg. then it import. how can i extract ?

  • @JCharisTech

    @JCharisTech

    Жыл бұрын

    Hello Lily you can use the `extract_text` function and an `if condition` to achieve that. I hope this helps

  • @aphadke77
    @aphadke77 Жыл бұрын

    How to extract embedded files information from a pdf file?

  • @drunkpy1590
    @drunkpy1590 Жыл бұрын

    Great video!

  • @JCharisTech

    @JCharisTech

    Жыл бұрын

    Glad you enjoyed it

  • @lucasmonta1
    @lucasmonta1 Жыл бұрын

    what is the IDE you are using?

  • @Munichandra_Reddy
    @Munichandra_Reddy Жыл бұрын

    I want to build one donation website, Please help me, how to do, and how to add upi option in that website, and I want to store the data web excel or SQL server database ,and how to give that website my friends , please explain me and Don't Skip it , please help me by using Streamlit, please teach the code

  • @JCharisTech

    @JCharisTech

    Жыл бұрын

    Thanks for the suggestion

  • @glass7933
    @glass7933 Жыл бұрын

    28:42 О_о. Do you speak Russian? Phrase "Это очень важно" in middle of the video was really surprising.

  • @mrinkahok1522
    @mrinkahok1522 Жыл бұрын

    Can you help me with this?

  • @KwameBrakoAsante
    @KwameBrakoAsante3 ай бұрын

    Hi. Are you Ghanaian? I just had to ask.

  • @mrinkahok1522
    @mrinkahok1522 Жыл бұрын

    for root, dirs, files in os.walk(main_data_path): for dir1 in dirs: huidige_file = path + "\\" + dir1 + "\\" + FILE if os.path.exists(huidige_file): pdf = PdfReader(huidige_file) with open(huidige_file, "rb") as pdf_reader: file_new = path + "\\" + dir1 + "\\" + outpdffilename pdfreader = PdfReader(pdf_reader) for i in range(1, len(pdfreader.pages)): selected_page = pdfreader.pages[i] # page = pdfreader.pages(i, num_of_pages) pdf_writer = PdfWriter() pdf_writer.add_page(selected_page) with open(file_new, "wb") as outPdf: pdf_writer.write(outPdf) print("Created a pdf: '{}'".format(file_new)), print(i) # pdf_writer.write(outPdf)