Web service¶
Palladium includes an HTTP service that can be used to make predictions over the web using models that were trained with the framework. There are two endpoints: /predict, that makes predictions, and /alive which provides a simple health status.
Predict¶
The /predict service uses HTTP query parameters to accept input features, and outputs a JSON response. The number and types of parameters depend on the application. An example is provided as part of the Tutorial.
On success, /predict will always return an HTTP status of 200. An error is indicated by either status 400 or 500, depending on whether the error was caused by malformed user input, or by an error on the server.
The PredictService
must be configured to
define what parameters and types are expected. Here is an example
configuration from the Tutorial:
'predict_service': {
'__factory__': 'palladium.server.PredictService',
'mapping': [
('sepal length', 'float'),
('sepal width', 'float'),
('petal length', 'float'),
('petal width', 'float'),
],
},
An example request might then look like this (assuming that you’re running a server locally on port 5000):
The usual output for a successful prediction has both a result
and
a metadata
entry. The metadata
provides the service name and
version as well as status information. An example:
{
"result": "Iris-virginica",
"metadata": {
"service_name": "iris",
"error_code": 0,
"status": "OK",
"service_version": "0.1"
}
}
An example that failed contains a status
set to ERROR
, an
error_code
and an error_message
. There is generally no
result
. Here is an example:
{
"metadata": {
"service_name": "iris",
"error_message": "BadRequest: ...",
"error_code": -1,
"status": "ERROR",
"service_version": "0.1"
}
}
It’s also possible to send a POST request instead of GET and predict for a number of samples at the same time. Say you want to predict for the class for two Iris examples, then your POST body might look like this:
[
{"sepal length": 6.3, "sepal width": 2.5, "petal length": 4.9, "petal width": 1.5},
{"sepal length": 5.3, "sepal width": 1.5, "petal length": 3.9, "petal width": 0.5}
]
The response will generally look the same, with the exception that now there’s a list of predictions that’s returned:
{
"result": ["Iris-virginica", "Iris-versicolor"],
"metadata": {
"service_name": "iris",
"error_code": 0,
"status": "OK",
"service_version": "0.1"
}
}
Should a different output format be desired than the one implemented
by PredictService
, it is possible to use a
different class altogether by setting an appropriate __factory__
(though that class will likely derive from
PredictService
for reasons of convenience).
A list of decorators may be configured such that they will be called
every time the /predict web service is called. To configure such a
decorator, that will act exactly as if it were used as a normal Python
decorator, use the predict_decorators
list setting. Here is an
example:
'predict_decorators': [
'my_package.my_predict_decorator',
],
Alive¶
The /alive service implements a simple health check. It’ll provide
information such as the palladium_version
in use, the current
memory_usage
by the web server process, and all metadata that has
been defined in the configuration under the service_metadata
entry. Here is an example for the Iris service:
{
"palladium_version": "0.6",
"service_metadata": {
"service_name": "iris",
"service_version": "0.1"
},
"memory_usage": 78,
"model": {
"updated": "2015-02-18T10:13:50.024478",
"metadata": {
"version": 2,
"train_timestamp": "2015-02-18T09:59:34.480063"
}
},
"process_metadata": {}
}
/alive can optionally check for the presence of data loaded into the
process’ cache (process_store
). That is because some scenarios
require the model and/or additional data to be loaded in memory before
they can answer requests efficiently
(cf. palladium.persistence.CachedUpdatePersister
and
palladium.dataset.ScheduledDatasetLoader
).
Say you expect the process_store
to be filled with a data
entry (because maybe you’re using
ScheduledDatasetLoader
) before you’re able to
answer requests. And you want /alive to return an error status (of
503) when that data hasn’t been loaded yet, then you’d add to your
configuration the following entry:
'alive': {
'process_store_required': ['data'],
},
List¶
The /list handler returns model and model persister data. Here’s some example output:
{
"models": [
{"train_timestamp": "2018-04-09T13:08:11.933814", "version": 1},
{"train_timestamp": "2018-04-09T13:11:05.336124", 'version': 2}
],
"properties": {"active-model": "8", "db-version": "1.2"}
}
Fit, Update Model Cache, and Activate¶
Palladium allows for periodic updates of the model by use of the
palladium.persistence.CachedUpdatePersister
. For this to
work, the web service’s model persister checks its model database
source periodically for new versions of the model. Meanwhile, another
process runs pld-fit
and saves a new model into the same model
database. When pld-fit
is done, the web services will load the
new model as part of the next periodic update.
The second option is to call the /fit web service endpoint, which
will essentially run the equivalent of pld-fit
, but in the web
service’s process. This has a few drawbacks compared to the first
method:
- The fitting will run inside the same process as the web service. While the model is fitting, your web service will likely use considerably more memory and processing while the fitting is underway.
- In multi-server or multi-process environments, you must take care of
updating existing model caches (e.g. when running
CachedUpdatePersister
) by hand. This can be done by calling the /update-model-cache endpoint for each server process.
An example request to trigger a fit looks like this (assuming that you’re running a server locally on port 5000):
The request will return immediately, after spawning a thread to do the actual fitting work. The JSON response has the job’s ID, which we’ll later require next to check the status of our job:
{"job_id": "1adf9b2d-0160-45f3-a81b-4d8e4edf2713"}
The /alive endpoint returns information about all jobs inside of the
service_metadata.jobs
entry. After submitting above job, we’ll
find that calling /alive returns something like this:
{
"palladium_version": "0.6",
// ...
"process_metadata": {
"jobs": {
"1adf9b2d-0160-45f3-a81b-4d8e4edf2713": {
"func": "<fit function>",
"info": "<MyModel>",
"started": "2018-04-09 09:44:52.660732",
"status": "finished",
"thread": 139693771835136
}
}
}
}
The finished
status indicates that the job was successfully
completed. info
contains a string representation of the
function’s return value.
When using a cached persister, you may also want to run the
/update-model-cache endpoint, which runs another job asynchronously,
the same way that /fit does, that is, by returning an id and
storing information about the job inside of process_metadata
.
/update-model-cache will update the cache of any caching model
persisters, such as
CachedUpdatePersister
.
The /fit and /update-model-cache endpoints aren’t registered by
default with the Flask app. To register the two endpoints, you can
either call the Flask app’s add_url_rules
directly or use the
convenience function palladium.server.add_url_rule()
instead
inside of your configuration file. An example of registering the two
endpoints is this:
'flask_add_url_rules': [
{
'__factory__': 'palladium.server.add_url_rule',
'rule': '/fit',
'view_func': 'palladium.server.fit',
'methods': ['POST'],
},
{
'__factory__': 'palladium.server.add_url_rule',
'rule': '/update-model-cache',
'view_func': 'palladium.server.update_model_cache',
'methods': ['POST'],
},
],
Another endpoint that’s not registered by default is /activate,
which works just like its command line counterpart: it takes a model
version and activates it in the model persister such that the next
prediction will use the active model. The handler can be found at
palladium.server.activate()
. It requires a request parameter
called model_version
.