Tackling High-Value Business Problems Using AutoML on Structured Data (Cloud Next ’19)

[MUSIC PLAYING] TIN-YUN HO: All right. My name is Tin-yun. And I’m a product manager
on the Cloud AI Team. And today, I’m here
to talk about how you can tackle some of your
high-value business problems using AutoML on one of the
most common types of data in the enterprise,
structured data. But before we dig in,
I wanted to first make sure everyone knows
what I’m talking about when I say machine
learning on structured data, right? So these use cases typically
start with a table of input, kind of like this, where each
row is an independent training example. And usually, one of these
columns is the target column. And the remaining columns
are input features. And the goal is to create
a machine learning model to predict the value of the
target column accurately using the available input feature
values for that row. So for example, here you may
have a– this is a [? toy ?] example– you might have a table
of historic offers from an online marketplace. And you want to create a model
that takes the available data on that offer to predict the
price at which that offer is sold. So for example, if you
want to provide pricing guidance to your sellers. And one common misconception
I wanted to point out is that people sometimes
think that you can only have simple values, like numbers
and classes, inside a table. But actually, especially
with modern data warehouse systems,
like BigQuery, you can put a rich set of
things in there– things like timestamps, long pieces
of text, lists of features, potentially repeated
nested fields. And that would be
only getting started. So according to the
McKinsey Global Institute, data in this basic
form, Structured Data, is likely to drive
most of AI’s impact, with Time Series
being a close second. And this really
comes as no surprise. Because virtually every industry
has mission critical use cases that can be boiled down
into this tabular form. So for example, in retail,
predicting the likelihood of stock outs or
predicting price elasticity so you can optimize
your product inventory, in finance, predicting the risk
of large claims or defaults or predicting the
likelihood of fraud in this transaction in
order to manage your risk, in marketing, predicting
lifetime value, like how much they’ll spend
on your site in the next three weeks, predicting
whether they’ll churn so that you can better
understand your customer. The list goes on. And these are use
cases that are so core to the respective industries
that even small improvements in model quality can
have significant business implications. But the challenge
is that especially as the number and complexity of
your input columns increases, you hit a combinatorial
explosion in things you need to worry about. So starting from
the left and going to the right, on
Data Preparation, for every one of your
individual feature columns, you need to think about
things like missing values and outliers and so forth. And then under
Feature Engineering, for each of those
columns, you need to think of the
right pre-processing to prepare it for
the following model that you select in the
Architecture Selection step. And there could be multiple
options per feature column. And then under
Architecture Selection, there are dozens of models
you could choose from, including more that come out
from the research community, basically on a monthly basis. And then for each
of those models you select, you have to select
the right hyperparameters. And there could potentially
be a dozen values to set. And then you have to think about
tuning, think about ensembling, if you’re trying to create
an especially good model, as well as model evaluation. And oh, by the way,
if you screw up in any one of these
individual steps, frequently you
have to start over. And this process of iteration
can go on for tens of times, especially if you
have a hard data set. I think data
scientists in the room have probably all
experienced that before. So this entire process
can take months, potentially dooming machine
learning projects altogether, as executive sponsors
lose interest. So in order to help you
overcome these challenges, we decided to build
AutoML Tables, a tool for enabling
your entire team, whether you’re a data scientist,
an analyst, or a developer, to automatically build and
deploy state-of-the-art machine learning models on
structured data. And I’m excited to announce
that AutoML Tables is entering into public
beta as of this morning. [APPLAUSE] Thank you. And the way the
product works is we provide a graphical, codeless
interface for guiding users through the entire end-to-end
machine learning life cycle with
significant automation as well as guardrails built in
at each individual step, which I will show you
shortly inside a demo. So we start with helping
you ingest your data and then easily define
your data schema and target, analyze your
input features in a feature statistics dashboard,
automatically train your model, including automated
feature engineering, model selection, and
hyperparameter tuning, as well as evaluate your model
behavior before deploying into production, and then
deploying in a single click. Through this, we can
help ensure that what used to take months of time
only takes weeks or even days. Digging deeper into some
of these different steps– so on the data ingestion
side, when able, we seek to handle data
as found in the wild. So we provide automated feature
engineering for, essentially, the major data
primitives that you can find in BigQuery, things
like numbers, timestamps, classes, lists, strings,
and nested fields. We do know there are more
other data primitives out there that we could cover. But this is our starting point. And we also worked hard to
ensure that we’re resilient to and provide guardrails for
imbalanced data, missing values, highly correlated
features, high cardinality features, like if every
row had a different ID, and outliers as well. After that, we
automatically searched through Google’s
entire model zoo to find the best model
for you, including linear logistic regression for
the smaller, simpler data sets, as well as things like deep
neural nets, ensembles, as well as architecture search methods,
if you have a larger, more complicated data set. And one of the benefits
of being at Google is we sit next to some of the
best research and consumer product teams in the world. So we can basically watch
whatever great research results come out and take the best and
put into AutoML Tables for you, kind of cherry picking
the best for you, sometimes even
before results are published on them in research
papers, so hot off the press. An example of this
is our collaboration with the Google Brain team. So they delivered what they call
Neural and Tree Architecture Search. So essentially what
this is is they took their architecture search
capability, similar to the one that they use for image
classification and translation problems. And they added
tree-based architectures to the existing neural depth
architecture search space, and then also added
automated feature engineering so that it can work for a wide
variety of structured data. Note that this isn’t published
in research papers yet, so I can’t give more details. But expect more to
be announced soon. And based on the
benchmarks we’ve done, the results speak
for themselves. So there are a large number
of vendors in this space. And we chose to benchmark
against a subset of them with similar functionality. And we benchmarked on
Kaggle competitions, which I love as a benchmark
because they involve real data from real companies
that are putting tens to hundreds of thousands
of dollars of prize money on the line to find
a good solution. And they’re willing to wait
months to get a good answer. That’s typically how long
these competitions take. And thousands of data
scientists across the world compete in them. So if you do well on a
Kaggle competition in terms of the ranking, you’re
pretty much guaranteed that you’re doing state-of-the-art
work on a problem that matters for the world. So that’s why we love
this class of benchmarks. And here you have,
on the x-axis, different Kaggle challenges
that we benchmarked on. And on the y-axis is the
percent ranking on, essentially, the final Kaggle leaderboard,
if we had actually participated. So these challenges
had already ended by the time we participated. We were just benchmarking
on the final leaderboard after the fact. And what you’ll see is that
most of the time AutoML Tables gets into the top
25%, which tends to be better than the
existing vendors we tested. But caveats apply, of course. For some of these data sets, we
were in the middle of the pack. And we’ll be the first
to admit that we have a lot of additional work to do. And we’re constantly tuning
and improving our systems. But when it comes down to it,
in general, we do pretty well. So benchmark results are here. And I wanted to dig into
one of these benchmarks to give you a better
sense of what it means to get into the top 25%. So here’s an example called
the Mercari Price Suggestion Challenge. Mercari is Japan’s biggest
community-powered shopping app and marketplace. And they created this
challenge for predicting the price of a product
offered on their marketplace so that they could give pricing
suggestions to their sellers. So this is a real version
of the example problem that I brought up at the
beginning of this talk. So some logistics. This challenge went on
for about three months. There was a $100,000 prize. 2,000 data scientists competed. And actually, the
winning data scientist made about 99 entries in
order to win this competition. So it was highly contested. It’s not just like a pass
through easy challenge, right? And the data look
kind of like this. So about 1.5 million
rows of offer examples like this with some rich
input features, right? So you’ve got the name. There are categorical
values, lists of categories, item descriptions. And some of these are really– this is dirty data, right? You’ve got no description yet. What is the model
supposed to do with that? You’ve got things like redacted
values over here, right? And there were missing
values as well. This is a good
real-life data set. And the goal is to predict
the price on this column. Now, here what
you see is a curve where we laid out the
performance of each participant in terms of the
final error achieved. So the further you
are to the right, the better that participant
did on the final leaderboard, right? And the higher you
are on the y-axis, the higher the error
on the final test set. So as expected, there is a
line that slopes downwards to the bottom right. But importantly, it
isn’t a consistent curve. So what you’ll see here
is that, in the beginning, there’s a steep decline,
right, where better feature engineering, better model
selection and all that, it really matters. There’s still more signal to
be squeezed out of the data. But then there’s a
long plateau where, basically, most
of the competitors have kind of done whatever they
can to squeeze the signal out of the data already. So getting to the
top 25% actually gets you to that plateau, right? And here is how AutoML Tables
does after different numbers of hours of training. So the way Tables works is you
select the number of hours. And here we have Tables
after one hour of training, 12 hours of training,
and 24 hours of training. So caveats, right? The Architecture Search, the
hyperparameter tuning process, these are all random. So if you run this yourself,
it might be a little bit more, a little bit less. And we did do some
limited data cleaning. So for example,
previously, the categories were split by slashes. So our system just treated
that as one big token. So in order to treat
it as individual words, you actually have to break out
the slashes by replacing them with white space. But this is very
basic stuff, right? And of course, this is
just the Mercari Challenge. Other challenges might
have different results. But still, for a million
row data set– a million plus row data set– with
significant complexity, one hour of training by
AutoML Tables gets you here. one hour of training. And oh, by the way, as
I mentioned earlier, the competition prize
for this was $100,000. And in comparison, one hour
of training on AutoML Tables is going to be $19. And by the way, you
click one more time. So the winning model– and
this is well known for Kaggle competitions– typically takes months of
time to actually productionize and deploy. But for AutoML Tables,
with one more click, and you can deploy the
model into production. So how does that sound? Is it a good deal? [APPLAUSE] This brings me to my
final point, which is, by using AutoML Tables,
you’ll save money. Not only does it increase
your team’s efficiency, but also there is no large
annual licensing fee. You basically pay
for what you use. And how we calculate
what you use is that it’s just a small
margin over the cost of the underlying compute and
memory infrastructure used. So for example,
when I said earlier that training
costed $19 per hour, that basically represents the
cost of 92 four CPU machines on Google Compute Engine. We’re using the equivalent
of that, essentially, under the hood. And prediction of
model deployment would be even cheaper than that. And regarding
reference customers, we’re thankful that we’ve had a
significant group of customers that agreed to join
our alpha pilot, with a number of them agreeing
to be public references listed here. We’re proud that we’ve
been able to prove our relevance across
geographies, industries, and company sizes. And you can see there are a wide
variety of company types here and profiles here
with wide varieties of also sophistication
in terms of data science. So you’ll hear specific use
cases from Fox Sports as well as GNP Seguros shortly. But before we get
too carried away, I’ll be the first to
admit that we are not perfect for everyone. So AutoML Tables is
usually the best fit if you meet the
following criteria. So first of all,
you have to know how to create good training data. And this is probably the
most important point, right? Somebody in the
room needs to know how to take a business
problem and translate it into an ML setup, a
classification or regression problem of some
sort, right– binary or multi-class classification. And that person needs to be
able to take training data, especially defining the label
column– the target column– so that it represents
the way you want the model to
perform in production. And somebody also
needs to understand how to make sure that the input
features are the same, in terms of the distribution, as what
is available at serving time so that you can prevent
training serving skew. And a trivial example of this. If the training data was
all in English, but then at serving time you are
feeding Japanese to the model, obviously it would fail. Second, you need to be willing
to wait at least an hour. So that’s our minimum unit
of training, is one hour. And if you were looking to
do minute-by-minute iteration on your model, potentially
so you can move quickly, maybe it’s very early in
your development phase, then you’re probably best
off using something else, like a BigQuery machine learning
or Cloud Machine Learning Engine. And finally, for now– I want to call out,
specifically, for now– if you have a larger and
more complicated data set, that would be ideal. So right now, if your data
set is less than 100,000 rows, you won’t be fully taking
advantage of the advanced algorithms that we’re using. And we’re actively
working on this problem. And expect more
announcements soon. But for now, the bigger
your data set, the better. And the flip side of
that is our current limit is set at 100 million rows,
just to be conservative. But if you have a data
set that goes beyond that, actually, we’re using Google’s
machine learning infrastructure under the hood. So we’re built for
much bigger than that. And we would love to
talk to you if you have a data set like that. So please reach out to your
Google account manager. And now I will go over a demo. One sec. Let me load this
up on my computer. Ready. Can we do the demo? Great. So now I’ll walk you through the
workflow using AutoML Tables. You start by ingesting
that table of data that represents your problem, right? So it can be from BigQuery
or from Google Cloud Storage. So you just select it
from Google Cloud Storage. You select your table. Say you want to use this one. And then you ingest. We’ve already ingested it
to save everybody time. So the next step is,
after you ingest, you will see a
schema page like this where you can fix the
variable type if it’s set to be the wrong one. We do automatic
schema inference. But you can update it if it
needs to be updated, right? You can also change
the columns in case they could potentially
be missing, right? So maybe at prediction
time, one of these columns might be missing. So we can turn on that
it can be nullable. And then you select
the target column. So for example, in this
problem, actually, the target is supposed to be Deposit. So this specific data set is
about binary classification, predicting, after
marketing outreach, whether somebody will make a
deposit at this bank or not. So it’s a
classification problem. And the target is Deposit. And if you’re an
advanced user, there are additional
parameters you can set. So for example, you could
set your own data split between training evaluation
and holdout sets. If certain rows are more
important than others, you can set a weight column. And if this data has a ordering
over time that matters, then you can set a Time column. And this will make sure that
the latest data shows up in the holdout set instead
of the Training split. Next, after you’ve
set your schema, we provide a dashboard of
basic feature statistics so that you can detect
common data issues. So for example, we allow
filtering by the feature type. And there are more
options available. But this particular
data set just has numeric and categorical
values, to keep it simple. And you can potentially see,
OK, what percent of them were missing? Maybe that could
indicate system problems. You can see how many
distinct values there are. So for Categorical
Values, especially, if you have a cardinality– the
number of distinct values that is similar to the
number of rows– then that probably
means that that column adds more noise than signal. And you should remove it. And then finally, you
can detect basic issues, like Correlation with
Target for target leakage, as well as for
numeric feature types, seeing the basic distribution. And also, if you deep dive on
any of these particular feature columns, you can see things like
the distribution of the values, the top correlated features
to it, and so forth. We’ll keep on adding more
as you ask for them– more feature statistic displays. Next, after you’re
comfortable with your data, you can train a model. So you set your model name. You set how many hours
you want to train for. And usually, as you saw
from the Mercari example, one hour is a good
place to start. You select which feature
columns you want to use. We automatically set an
optimization objective for you. But if you want to set
an objective yourself, because you know that
one of them lines up better with your business
goal, you can do that, right? And then you train a model. I’ve already trained
a couple here. So let’s deep dive
into one of them. Next is the Evaluation tab. So we provide both the overall
performance of that model here on the top, as well
some useful metadata, like what features
were used and what it was set to optimize
for, as well as allowing you to export the
predictions on the test set to BigQuery if you want to do
additional analysis, as well as evaluation metrics on
individual label slices. So here, for this data
set, the value one means that they did not
make a deposit at the bank. And that’s 90% of
the data, right? So you’ll be able to see that
here’s the model performance for that slice of the data. And then two means that
they did make a deposit. So that’s the minority class. And you’ll see that it
does slightly worse. But that’s typically common
for imbalanced data sets. And it’s up to you to
figure out how you want to– whether that’s
efficient or whether you want to try other
things, like improving the weight of the imbalanced
class and so forth. So you can flip between
them and kind of see the difference, right? You can set the score
threshold to see how the precision and recall
are at different points on that curve. If you’re interested in
exactly how misclassification is happening on
the model, then you can look at the Confusion
Matrix to see that. And then we have
Feature Importance just to give you a sense
of which features were the most important for
making those predictions. And finally, in
the Predict tab, we allow batch prediction
and online prediction. So with batch
prediction, you can upload a table from BigQuery
or from Google Cloud Storage and export your
results there as well. And for online prediction,
it’s like I said. It’s one click to
deploy your model. And we deploy
globally to make sure that you get low
latency serving. And then you can test your
model here inside the dashboard. So hopefully the demo gods are
kind to me and this is working. Great. So you can see the
prediction result, right? And for regression
problems, we would give you a prediction interval, a
95% confidence interval. And for classification, we
will provide confidence scores for the different labels. And that concludes my demo. Can we move back to the slides? [APPLAUSE] Great. And now I’m excited to introduce
Chris and Jack to share how Fox Sports is creatively
using AutoML Tables to deliver a brilliant audience experience. So, all right. Got a clicker? CHRIS POCOCK: So g’day. My name’s Chris Pocock. And first of all, I’d
like to say thank you to Tin-yun and the team at
Google for inviting us over from the other side of the
world to come and share our story about Monty. I’d also like to introduce
Jack Smyth, waving over there. He’ll be talking to you
again in a few minutes’ time. So look, I look after marketing
for Fox Sports Australia. And it’s kind of
a privilege to be able to get paid to talk
about sport all day long. So it’s a pretty cool job. So I am actually
originally from the UK. I’ve lived in Australia
for about five years. So I want to start off by giving
a foreigner’s context on what sport means to Australians. It’s a sporting nation. It’s got the little guys
punching above their weight. And it’s not just
part of the life. They live and breathe it. It’s part of their psyche. It’s part of their
national identity. They identify themselves
through sporting success. So sport is a really, really
big deal in this country. To give you a little bit of
context, looking at the greater Los Angeles area, there’s 18
million people that live there. They’ve got about 10
professional sports teams by my reckoning. Sydney, 4.5 million people– pretty big. We have 25 professional
sports teams. So there’s a lot of
sport to go around. The other kind of
contextual thing is that Australians have kind of
looked to international sport, and they’ve looked at
football, soccer from Europe. And then they’ve looked
at American football and gone, nah, it’s not for us. We’re just going to
invent our own sport. So they’ve invented
Australian rules football. It is probably one of the most
popular games in the country. Only played in Australia,
professionally. But it is a big,
big sport for them. So big that in this
little country of about 25 million people, they pack
out a stadium 90,000 people, week in, week out. So it’s pretty impressive. But Australian rules football
is not the only sport played. Australia’s actually a
pretty divided nation when it comes to sport. So I’m just kind of highlighting
up some of our key sports we have here. So we have rugby union at the
top, Australian rules football, rugby league, and motor sport. Those are the key
broadcast sports that we have on Fox Sports. But within Australia
itself, certain sports have bigger popularity
than the other depending on where you live
or where you went to school. Rugby league, for example,
really big in Sydney, not so much in Melbourne. Australian rules football,
really big in Melbourne, not so big in Sydney. So it’s a divided nation. So it kind of takes
us through our journey at Fox Sports where we have all
these key sports from January through to September. It’s a great lineup. There is a bit of a
gap in the summer. So last year, we signed
a broadcast rights deal for cricket. And cricket is the summer sport. You will notice that it kind
of looks at October, December. We’re upside down. That is our summertime. So that kind of filled the gap. And cricket, it costs
us a billion dollars for the broadcast
rights for that. So there’s a little bit of
importance on making it work. So we had to look at cricket
from a context of it having been on free-to-air television. So Australians have been
able to watch it for free since day dawn. We’re now asking
you to pay for it. So we had to promise a cricket
experience like never before. And this is kind of where
our AutoML Tables came in. Now, I’m conscious
of where I’m standing and what country I’m in. So I just want to
talk a little bit about what cricket actually is. It’s a game that we
sometimes play for five days. And no one wins. So I kind of think
that blows a few minds. In Australia, we
draw quite a lot. But it can be like
a chess match. So it’s really, really tactical. And what you’ll see is kind
of the last man holding on against big Goliath
fast bowlers or the game changing on a dime with
three quick wickets. It does explode into action. But realistically, there are
only 18 seconds of wicket, so the wicket being
this guy here. The aim of the game is for
this guy to bowl this guy out. That really gives us, across
5 days, 18 seconds of action. It’s a long, long game. So what we try and do with
this experiment with the AutoML Tables is to warn
fans when we think a wicket is going
to happen ahead of time so they
know that they need to be in front of the screen. Don’t go to the bathroom. Don’t go out and do a
barbecue because the action is coming up. Now, cricket also is a really
great game for structured data. There’s a lot of
different variables that we can measure
to accurately predict what we think is going
to happen in the future– the types of ball bowled. You can do spin. You can do fast. How long has the batsman
been at the crease? Has he been there for an hour? Has he been there
for four hours? Is he tired, not tired? Has he run around a lot? Is he the last man standing? Generally, the last man
standing is a lesser batsman versus the first
guy who goes out. What field positions
has the captain set out around where they are
looking to catch the ball? And about 80 other data points. So it makes it an
ideal experiment that we tried out last summer. And it bloody works, which
was quite impressive. So look, I want to let Monty– so just a little bit of context. We named him Monty because
the first wicket he accurately predicted was a chap
called Monty Panesar. So it seemed apt to
name him after that. So I’m going to introduce you
to– this is a world first– a world’s first automated
machine learning commentator. I’ll let Monty give it
to you in his own words. [VIDEO PLAYBACK] [MUSIC PLAYING] – Got it! – Why wouldn’t you be
happy with that one? – Can you believe it? – Oh, it’s a beauty. Yes, yes. – [INAUDIBLE] immediately. – Oh, what a catch. – [INAUDIBLE]. – Oh, I like that too. And Paul Wilson says Yes. – Oh, yes. Oh, that’s such a hit. – He’s done it again. – That is going out
of the ballpark. – Absolutely brilliant. What an advertisement
for the game. [END PLAYBACK] JACK SMYTH: All right. Thanks so much, Chris. Hi, everyone. My name’s Jack. Oh, I’ll wait for applause. [APPLAUSE] My name’s Jack. And I’ll walk you through
how Mindshare created Monty in collaboration with Google. Now, the first step is to
find a world class data provider in Optus Sports. So there you track
83 unique variables with every single ball. And that’s available
within seconds of it leaving the bowler’s hand. So we knew we had great training
data and a live feed set up. So after that, we could start
experimenting with tables. Now, I’ll say the first
impressions of tables was striking. The speed and simplicity of
this platform blew us away. We could easily ingest a
year’s worth of training data, simply select Wickets
as the label to predict. And within hours, we were
seeing impressive results. To be totally honest,
some of the team members even thought that
Monty must be cheating because the early results
were simply that good. However, those doubts
quickly evaporated as we moved through
the training process. Tables allowed us to collaborate
a model that could keep pace with a live game. So we moved ahead with
the classification model so we could not only predict
when a wicket would fall but how. And the final output
looks something like this. This is a real example from
a recent game between England and the West Indies. And you can see the
end point would return an analysis of the latest ball. And in this case, the batsman’s
safe for the next five minutes. The prediction is not out. But then you can see
the confidence scores for each method of
taking a wicket. And there’s a rising danger
there of him being caught. So once we saw the
accuracy during testing, we knew that Monty was
ready for the main stage. And I have to say,
as Australians, we don’t really believe
in soft launches. So we chose the biggest,
most iconic match in the Australian sporting
calendar to debut Monty. Now, millions of fans– OK. [VIDEO PLAYBACK] [MUSIC PLAYING] – The MCG. We come here every Boxing
Day for Australia’s biggest day of cricket. Day one of the
Melbourne test match. More than 80,000
fans in the stands. Millions watching and
listening around the world, magically drawn to the
history and the tradition. This is where
reputations are made. – Got it. [INAUDIBLE] – This is where Dennis Lillee
knocks over Viv Richards with the last bowl of the day. – He’s got it! Oh, what a magnificent
start for Australia. – You don’t forget that. As a kid, you dream of doing
something amazing at the G. And a few get lucky. – Got him! [INAUDIBLE] [END PLAYBACK] JACK SMYTH: All right. So how did it go? Monty absolutely– we
can go to the next slide. That was seamless. So how did he go? Monty absolutely smashed
expectations in his first game. We were thrilled
with the accuracy. And Tables really came
through when it counted most, when we had most of the country
counting on Monty’s call. So his confidence
scores for each wicket were displayed live in ads just
like this for millions of fans. And that stretched from push
notifications on the Fox Cricket app all the
way through to preroll. And we were thrilled with
that accuracy figure. That is astonishing
for live sport. And I would say as well that
even the moments he missed became highlights
in their own right. Because that meant that
a bowler had come out of nowhere to take a
wicket without warning and had literally
beaten the odds. Now, as our confidence
grew in Monty, we essentially scaled him
up to become the command center of our entire campaign. So this is the
final architecture. And this is largely thanks to an
amazing Googler back in Sydney named Drew Jarrett. And in the time remaining,
I’ll give you a quick overview of how it worked. Down here, when a ball was
passed through our system from Opta, App Engine
would receive that. It would pass it
through for processing, would use Dataflow to look
at that ball, and also an aggregate of
the recent balls, to give us what we called
a Prediction Window. The model would then
make a prediction based on that window
and the individual ball. And then App Engine
would facilitate requests from live digital billboards,
from the Fox Cricket app, Google Ads bidding scripts,
studio dynamic templates, and even Google Assistant. And I have to say, for me, the
Google Assistant experience was a particular highlight. This was combining
the predictive power of AutoML Tables with
the personalization capabilities of Assistant. And we could genuinely
offer every fan their very own
on-demand commentator. So through Assistant,
fans could ask for the latest call from Monty. They could understand how
he had made that prediction. And of course, they get
the latest team news. And for me, this is
more than marketing. This is an entirely new product
experience made possible through Tables. And it’s one that we
are very, very proud of. At the close of the campaign,
the impact of Tables was clear. So we saw 150% improvement
in marketing ROI. Brand recall for
Fox Sports doubled. And the Fox Cricket app
delivered an increase of 140% compared to category
competitors. So with a scorecard
like that, you can be certain you haven’t
heard the last of Monty. And I’ll hand you back to Chris. He’ll be able to talk
you through where we are going next. CHRIS POCOCK: Thanks, Jack. So those are pretty
damn good results. And it, frankly,
took me by surprise just how well it all worked. So this is not the
end of our journey. What we really want to
do is take Monty forward and beyond what we’ve done this
last summer, not only taking it into next summer, but
also start applying it to some of our winter sports. So we’re looking at doing
a man versus machine. Let Monty take on [INAUDIBLE],,
so panel of experts. What do they think the
results are going to be? Can they beat AutoML? Really excited to see
if they can or can’t. We also want to be able
to integrate it properly on our air, so on
our broadcast, being able to allow our
commentators to warn fans when exciting moments are upcoming. Don’t go to the bathroom. Make sure you stick around. Monty thinks there’s
a wicket coming. Fantasy is also a fantastic way
of applying Monty’s learning model. Each week, he’ll build
his ultimate team. And can it beat all of our
consumers, all of our experts, week in, week out,
by building based on structured data versus
the gut feel of our experts? Let’s see what happens. I’m really excited
for the future. It’s been an amazing
product, a great journey. So thank you to everyone. And I appreciate you
listening to our story. [APPLAUSE] TIN-YUN HO: That was awesome. CHRIS POCOCK: So now, just
my last role of the day is to introduce Enrique and
Carlos from GNP Seguros. Hopefully I’ve
pronounced that properly. But welcome, guys,
and thank you. ENRIQUE: Thank you very much. Well, hello. Thank you to our
Tin-Yun and to Google for inviting us to share
our experience with AutoML at this conference, and
congratulations to Fox for such an interesting case. GNP is one of the largest
insurance companies in Mexico. The company was
founded 116 years ago. And we have a yearly sales
of around $3 billion. As any large and
well-established company, GNP is undergoing a
profound transformation. I was going to say
digital transformation, but we haven’t quite figured
out the precise definition of digital transformation,
because everyone’s talking about it. So I’ll just mention
transformation. But anyway, what
we want to do is to modernize our information
systems and our operations. And we’re leveraging heavily
on the cloud to achieve that. And one of the
strategic initiatives that we’re executing
under this transformation is to assemble a single
corporate Data Lake. That is a single repository
where all the information of the company would reside. And besides being an important
enabler of efficiency, because we don’t have to
figure out where to go and look for information
because everything is going to be located
or is located already in the central Data Lake,
we regard the Data Lake as an important source of
competitive differentiation and of intelligence. And to achieve that, we need
to squeeze knowledge and value from all the data that’s
stored in the Data Lake. And to achieve that,
we have recently started to use machine learning
to actually uncover and squeeze the value from the information. And the problem with
machine learning is that the availability of
well-trained data scientists is not that great. So Carlos is a head of
machine learning at GNP, but he’s kind of a
rare breed of person, because you don’t have
that many in the market. So when Google offered us the– described to us the
concept of AutoML Tables, and they offered us the
opportunity to test it, we thought it was
absolutely great. Because we want to democratize
the use of machine learning in the company,
lower the complexity of generating new models. So it looked very promising. And we embarked on testing and
trying this great new tool. So basically what
we did is we play with a tool with three different
use cases, one in the auto industry, in the auto
insurance line of business, one in the health
care line business. And the third was general
utility for our underwriters for collective insurances that
we’ll describe a little later. So the first example
is really simple. It’s, given the characteristics
of the insured car and of the owner,
try to predict what’s the probability of that
car of having an accident. And to provide the highlights of
this, the technical highlights of this exercise, I’ll allow
Carlos to share it with you. CARLOS: Hello to everyone. I will only explain how we’re
using AutoML Tables to solve insurance problems at GNP. As Enrique said, the first
problem is calculating risk. This problem consists in
determining and predicting the probability of
a car accident using drivers’ characteristics, such
as age, gender, claim history, as well as vehicle
characteristics, such as type, model, and intensity usage. One thing that is real
amazing about AutoML Tables is that you don’t have to
worry about future engineering or hyperparameter tuning
development issues. The only thing
that you need to do is the one that Tin-Yu
had in the demo. And I want to
highlight that this has a very good quality
confirmed by the F score in the green square. And I will let the next– flip to the next problem
Enrique introduced. ENRIQUE: So the second
use case that we used to try out AutoML Tables
is we had previously already developed a machine
learning model to try to detect fraudulent
claims in health care. And what we did is we created a
new model using AutoML Tables, and we compare the
results to their results that are already provided by
the existing model that we have. And given the characteristics
of the patient, which is our customer,
and the sales agent, who sold the policy
to this customer, and the specific disease
that this person has, and the hospital, and
the doctors involved, we try to determine if that
claim is fraudulent or not. Just to give you an idea, our
yearly expense on health care claims, these are
around $680 million. So any small
percentage that we can achieve in improving
detecting fraudulent claims, it translates directly into
a lot of additional revenue to the company. So I’ll let Carlos provide
you with the highlights of this exercise. CARLOS: OK, thanks, Enrique. Again, we use the clients’
claim history, as well as medical and hospital
information. And one thing that is really
neat about the AutoML Tables, is that you have all the
performance evaluations, the performance results that
you need at the click distance. You don’t have to make
additional work in order to compute evaluation matrix. And I want to highlight
that this model had a very high quality compared
by the F1 score of 0.94. And we confirmed this quality
on an independent holdout set, and I want to highlight that. It led to 20% to 30%
improvement in respect to our existing ML solutions. ENRIQUE: So bottom line,
what we were really impressed is, just by using
AutoML, we got a 20%, between 20% and 30%
increase in effectiveness of detecting claims. So that was really promising
and very encouraging to continue using this tool. And the last example
was a practical example. We not only insure individuals. We also insure
collectives of people. That is, for example, all
the employees of a company. So let’s say that we’re going
to insure all the employees of a specific company. So what usually
the sales channel does is, once that
they do the sale, they provide either a
CSV or a spreadsheet file to our
underwriting department so that they can do the
quote and the underwriting of the policy. And one of the fields
that is required for each of the
persons that integrate this collective of people
is the gender of the person. But sometimes the
sales channel omits to provide that information. And when that happens, the
underwriting department, what they usually did
before was break up the file into many small, different
files and then send that file to a lot of different persons. And manually each
person will basically classify, based on the
name of the person, if that person is
male or female. So the last time
that that happened, one of the last times
that that happened, the file was 10,000 rows long. So it was a lot of work. So one of the persons from
the underwriting department approached our data science
team and said, well, you’re supposed to be smart. And there should be a
better way of doing this. And it was a simple example. What we did is basically we used
our master database as a sort– we used the data of
the master database to train a model to learn what’s
the gender of a person based on its name and last
name, its full name. So that’s basically what we did. And that was– it’s a
small problem to solve, but it’s actually– it had
a lot of practical usage in the company, because this
kind of happens kind of often. So it ended up
being a usable tool. And I’ll let Carlos
describe this exercise. CARLOS: Thanks again. Well, as you can
see on the screen, we made a comparison between
[? A’s ?] naive model, only using the feature
column of their first name. And obviously, it got
a bad quality result. And we made a little character
level feature engineering, and we recomposed the first
name into suffixes and prefixes. And we gain a lot
of performance, as it can be seen on the screen. This shows the
power on hand that makes AutoML Tables when the
relevant variables are chosen. Again, I want to thank the
Google Cloud team for letting shape the cloud together. ENRIQUE: So in this
example, by tweaking a little bit the
feature engineering, we were able to, on
leveraging on AutoML Tables, achieve a very high
efficiency of this model. So this is just
another small example of how can we use this tool. And the results that we saw
in AutoML with the use cases that we develop at GNP
are really promising. So we’re really
excited in the idea of leveraging on this
tool to developing more machine learning
models much more quickly. We are determined to use machine
learning to solve real business problems or problems
that are already solved with some
sort of automation, giving them a much
better solution. And one way to achieve
these results quicker is through tools like
machine learning. And one of the things
that we want to achieve is, currently, the underwriting
of medical health care goes through a process. And using a set of
predefined rules, around 55% of all the new
customers, its underwriting process is fully automatic. And we want to change the
implementation that we have. We want to replace the static
rules with a machine learning model that would allow
us to have 80% of all the underwriting instances to be
fully automatic, and only a 20% will be derived to
a set of experts to determine the underwriting. So that’s one of the
targets that we have. And on the other end, since
we already compare the results that we got with AutoML with the
existing machine learning model we have for detecting
fraudulent health care claims, we want to improve
in 10% this year the amount of fraudulent
claims that we’re able to detect in
order to contribute to additional revenue
in the company. So that’s basically
what we’ve done at GNP. I think this is a great tool. And back to you, Tin-Yu. Thanks. [APPLAUSE]