study notes: Convert pdf to audiobook using python (Google Text-to-Speech) and export to mp3
1.Reading in pdf and extracting page number
import PyPDF2
from gtts import gTTS
import rebook = open(’sample.pdf', 'rb') # load pdf - sample.pdf
pdfReader = PyPDF2.PdfFileReader(book)
pages =pdfReader.numPages
2.Reading in data from each page and concatenate into a single text using for loop
full_book = “My Audiobook" # assign a title to audiobookfor i in range(0, pages):
page = pdfReader.getPage(i)
text = page.extractText()
page_num = str(i+1) # to add page number info into string
full_book = f"{full_book} Page {page_num} {text}"
3.Remove redundant and repetitive phrases within text
full_book = re.sub('(This page intentionally left blank.)|(Click here for Terms of Use.)|(Copyright 2020.)', ' ', full_book)
4.Convert to audiofile using Google Text-to-Speech and export to mp3
obj = gTTS(text=full_book, lang='en', slow=False)obj.save(’sample'.mp3')