study notes: Convert pdf to audiobook using python (Google Text-to-Speech) and export to mp3

Oct 28, 2020

1.Reading in pdf and extracting page number

import PyPDF2
from gtts import gTTS
import rebook = open(’sample.pdf', 'rb') # load pdf - sample.pdf
pdfReader = PyPDF2.PdfFileReader(book)
pages =pdfReader.numPages

2.Reading in data from each page and concatenate into a single text using for loop

full_book = “My Audiobook" # assign a title to audiobookfor i in range(0, pages): 
    page = pdfReader.getPage(i)
    text = page.extractText()
    page_num = str(i+1) # to add page number info into string
    full_book = f"{full_book} Page {page_num} {text}"

3.Remove redundant and repetitive phrases within text

full_book = re.sub('(This page intentionally left blank.)|(Click here for Terms of Use.)|(Copyright 2020.)', ' ', full_book)

4.Convert to audiofile using Google Text-to-Speech and export to mp3

obj = gTTS(text=full_book, lang='en', slow=False)obj.save(’sample'.mp3')

study notes: Convert pdf to audiobook using python (Google Text-to-Speech) and export to mp3

Written by Cheryl

No responses yet