study notes: Convert pdf to audiobook using python (Google Text-to-Speech) and export to mp3

Cheryl
Oct 28, 2020

--

1.Reading in pdf and extracting page number

import PyPDF2
from gtts import gTTS
import re
book = open(’sample.pdf', 'rb') # load pdf - sample.pdf
pdfReader = PyPDF2.PdfFileReader(book)
pages =pdfReader.numPages

2.Reading in data from each page and concatenate into a single text using for loop

full_book = “My Audiobook" # assign a title to audiobookfor i in range(0, pages): 
page = pdfReader.getPage(i)
text = page.extractText()
page_num = str(i+1) # to add page number info into string
full_book = f"{full_book} Page {page_num} {text}"

3.Remove redundant and repetitive phrases within text

full_book = re.sub('(This page intentionally left blank.)|(Click here for Terms of Use.)|(Copyright 2020.)', ' ', full_book)

4.Convert to audiofile using Google Text-to-Speech and export to mp3

obj = gTTS(text=full_book, lang='en', slow=False)obj.save(’sample'.mp3')

--

--

Cheryl
Cheryl

Written by Cheryl

trouvez vous un cato. etre un cato.

No responses yet