How I made a Web App that converts E-book into AudioBook With Under 20 Lines of Code

Mohd Saif Ali
3 min readMay 28, 2021

--

Source: slate.com

Me no Web Dev, but trust me when I tell you that it's an end-to-end deployable responsive web app that would actually convert to any pdf file into text and then into audio in just a click or maybe 2. So the tech behind it is obviously python and is under 20 lines of code and the project basically could be divided into 3 steps namely the front end to get the pdf file, extraction of text from pdf, conversion of extracted text into playable mp3 file.

Starting with the GUI that in my case web interface, which lately is hot in the market and is called streamlit. Well, it's no replacement for existing web development frameworks in python like flask or Django. It is basically designed to help data scientists and ML engineers to deploy their projects online. However, I had a weird idea to use it replace as GUI for our replacement, as it is very easy to deploy streamlit applications on hosting platforms such as Heroku. I used the streamlit to allow users to upload their pdf on the web app and to play audio after conversion.

So, you can install the streamlit library by running the following command on your terminal

pip install streamlit

Coming to the extraction of text, I used the library called pdfplumber, which is not the only library for doing so, yet I preferred this, as I had no issues running it, it worked as it was documented.

Install the library by running the following command.

pip install pdfplumber

Coming to the conversion of text to audio, I used the OG library for text to speak called gTTS. Why? I mean is there any better model than Google in language processing?

Install the library by running the following command.

pip install gtts

After we have all the libraries running, we import the required models from each of the libraries.

import streamlit as stfrom gtts import gTTSimport pdfplumber

Now we set up placeholders for the web app for the users to upload their files.

#Just the title for the web app
st.title("Convert all your E-books to Audio Books just like that!")
#storing the uploaded pdf file in the variable book for further #processing
book = st.file_uploader("Please upload your pdf")

Now we let the pdfplumber extract the text page by page from our pdf stored in book variable

with pdfplumber.open(book) as pdf:  for text in pdf.pages:     single_page_text = text.extract_text()     all_text = all_text + '\n' + str(single_page_text)

Now that we have all the text content from pdf into all_text variable, we pass the variable to gtts to convert it into an audio file.

    tts = gTTS(all_text)    tts.save('all_text.mp3')

Now, that we have our mp3 audio file ready, we use streamlit media component to display/play the audio.

audio_file = open('all_text.mp3','rb')audio_bytes = audio_file.read()st.audio(audio_bytes, format='audio/wav', start_time=0)

Finally, comes the fun part, making it run on the browser

streamlit run appname.py

Which turns out to look something like this

So to prove that the entire code is under 20 lines here's the link to the complete source code. give it as star may be?

Connect with me on LinkedIn

--

--

Mohd Saif Ali

Data Scientist Seeking Cure for lack of melatonin owing to the love of DATA