Transcript
This transcript was autogenerated. To make changes, submit a PR.
My name is Michael Wehar, and I'm going to tell you about using Python
to build applications for language learning.
So let's jump right in. So I developed this
multiplatform tool called word of the hour,
or wath for short, and it helps you to learn words
in multiple languages. Every hour a vocabulary word
is posted, along with english definitions and translations
into over a dozen languages. Our main goal
with this multiplatform tool is basically
language learning is tough, and we want to provide
some simple content that you can digest on regular
time intervals that will help support you
and motivate you as a language learner.
So as I said, it's supported on many different platforms,
and we currently support web, Android, iOS,
Slack, Roku, Fire, TV, Electron,
and many more platforms. You can see some screenshots of
what wath looks like on these different platforms,
and Roku is one of our most popular platforms where there's an
active screen saver.
So how do we use Python? Python isn't the only
language we use, but it is a very important language to
us. So there's three core areas where we use
Python. The first is word selection. So we
need to select what words are we using to post. So we actually analyze
a data set of over 200,000 words, and we
run various statistical analyses to select
which words should be featured by wath.
So the next area where Python is really important
to us is crowdsourcing. We actually have
this whole system set up where users can
enter crowdsourcing data into Google
sheets, and we'll scrape from those Google sheets and
combine all the data together, and that'll help
us to provide better content in the future. So we've actually
crowdsourced over 35,000 translations.
And the next area is social media posts.
So posting this language content regularly to relevant
social media platforms is an important part of
this tool. So those are the three
areas where Python is really important to us.
And I want to go into a little bit more detail about that.
So with word selection, we start with
over 200,000 words, and we have to generate,
generate these relevant features about those words.
And then we have to do ranking and filtering. We have various processes
for doing that. And some of these features we
generate are based on the frequency dependency between
words, and also context of where that word appears in
different situations. So in order to do
this word selection in Python, we have to do some file IO.
We have to actually read in these data files that contain a lot of
text and language data. We have to build up these dictionaries
where we basically map words to information
about them, and then we have to do various sorting
procedures to basically rank
or filter the words. And we also
use regex. And regex allows us to
basically parse or detect certain patterns
within the data associated with the words.
Okay. And in Python, all of these things are readily available.
So for crowdsourcing, again, we need to do some file
I o. But let's talk a little bit about the crowdsourcing.
So we actually do crowdsourcing for 40 different languages.
And as I said before, we've received over 35,000 crowdsourced
translation submissions from users. And many
of these 35,000 submissions have been edited,
modified, and keep being updated.
So if you include all those updates and edits,
it's many more than 35,000.
And there are two languages where we've
had a really enthusiastic group of users
supporting the crowdsourcing, and that's Portuguese and. Cornish.
But we've had many other languages with users
who are really enthusiastic as well, but didn't quite submit
as much as was submitted for Portuguese and.
Cornish so, again, we use
file IO, where basically we're
reading in all kinds of past data submissions
that act as a sort of basis for some of our translations.
Then we make requests to our Google sheets and various
sources where the crowdsourcing data has been submitted.
And then we actually do some filtering to kind
of verify that the crowdsourced data meets some
basic quality standards. And then we actually
use git to record what changes
have been made and to have a sort of checkpoint we can come back to
to see how the data changed at that point
in time.
All right, so for the social media posts, we post to about
30 different social media pages every hour across
various different social media platforms. And for
just about all of these platforms, we use some kind of API
that allows us to interact with the social media platform.
We also use the WAF API, which I'll talk about in a
bit. Or we use some direct endpoints associated with WAF
to get the current word and the data associated with the current word.
So, actually, right now, we're not
actively using many Python
based APIs to post to social media,
but we have used some in the past. We did have one
platform where we'd actually post images using
Python on an hourly or every
few hour basis. And we also on
Discord, we use a python bot
that we actually post content hourly
with. But some of our other postings
to social media don't actually happen in Python, but they
could. So how would we do this
if there was a new social media page you wanted to post
content to? Well, first,
in the past we've used input arguments to kind of customize
how the post will occur or where the post should be
made. In Python. That's really easy for us to
use input arguments. Also, we need to do some simple
text operations to format our post.
And various social media platforms have restrictions
on what kinds of characters and what
kinds of patterns are allowed to be contained in your posts.
And then we use a social media API
which will allow us to actually submit the post
to that platform and I'll show you in a bit. But we have
the wash API in Python and
we can use that to easily get the current word in its data.
So let me tell you about the wash API. This is what I'm really excited
about to present here at this conference.
The Wath API enables Python developers to include
the current word of the hour along with its english definitions and
translations within their APIs.
You can clone our repo and import the wath API,
and then you can simply just call fetch, and fetch will return
you a dictionary object where there are keys,
word definitions and translations to really simply
get the current data.
So I'm going to show a demo to you, and I hope that you may
follow along and try out this demo yourself so
that maybe you can incorporate word of the hour into some of
your apps. So here
is our public git repo on GitHub.com
and you can take a look at this and whenever you're ready you
can clone this repo. So I
cloned the repo and I opened up the test py file.
That's the code you see on the right here.
And within this test py file you'll see that I first
import wath API and
then I call fetch. And then I
have four tests here. The first
test is to get the current word. The second test
is to check if there's a translation into German.
And the third test is to get all of those translations.
And then the fourth test is to get the definitions.
So let's run the code and see what happens.
Okay, you can see that it actually returned that
the current word is grant. We don't have a german translation,
unfortunately for this word. So hopefully crowdsourcing
might help us to fill in that german translation.
But you can see we have translations into many other languages
and we have some definitions below as well.
So using the Wath API is as simple as that.
You just have to call fetch and then look up the
data points that you want and
you could incorporate this into your web apps or any
type of application you have that's python based,
and you'd be able to share the word of the hour.
Whether you're sharing it just to show what's the word of the hour,
or you have some other kind of language tool,
it can be a great supplement that can help learners.
So I encourage you to try this out for yourself,
and I really appreciate you taking the time to listen to this
talk. Thank you so much, and I hope you
have a great 42 conference.
Okay, bye.