Kaggle API指南

来源:Official Kaggle API

安装

pip install kaggle

建议使用普通用户安装,而非 root 用户。其中二进制文件安装路径,

  • Linux: ~/.local/bin

  • Windows: $PYTHON_HOME/Scripts

API凭证

  1. 创建 Token

    https://www.kaggle.com/<username>/account 页面点击 Create API Token,将会触发下载产生的凭证配置文件 kaggle.json

  2. 放置配置文件 echo %HOMEPATH%

    Linux: ~/.kaggle/kaggle.json

    Windows: C:\Users\<Windows-username>\.kaggle\kaggle.json

  3. chmod

    chmod 600 ~/.kaggle/kaggle.json

  • 也可以直接导入 Kaggle <username> and <token>
1
2
export KAGGLE_USERNAME=datadinosaur
export KAGGLE_KEY=xxxxxxxxxxxxxx

命令

命令行工具支持以下命令:

1
2
3
4
kaggle competitions {list, files, download, submit, submissions, leaderboard}
kaggle datasets {list, files, download, create, version, init}
kaggle kernels {list, init, push, pull, output, status}
kaggle config {view, set, unset}

赛事

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
usage: kaggle competitions [-h]
{list,files,download,submit,submissions,leaderboard}
...

optional arguments:
-h, --help show this help message and exit

commands:
{list,files,download,submit,submissions,leaderboard}
list List available competitions 列出可用的比赛
files List competition files 列出比赛文件
download Download competition files 下载比赛文件
submit Make a new competition submission 提交新的比赛提交
submissions Show your competition submissions 参赛作品展示你的参赛作品
leaderboard Get competition leaderboard information 排行榜获取竞争排行榜信息

比赛名单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
usage: kaggle competitions list [-h] [--group GROUP] [--category CATEGORY] [--sort-by SORT_BY] [-p PAGE] [-s SEARCH] [-v]

optional arguments:
-h, --help show this help message and exit
--group GROUP Search for competitions in a specific group. Default is 'general'. Valid options are 'general', 'entered', and 'inClass' 搜索特定组中的比赛

--category CATEGORY Search for competitions of a specific category. Default is 'all'. Valid options are 'all', 'featured', 'research', 'recruitment', 'gettingStarted', 'masters', and 'playground' 搜索特定类别的比赛

--sort-by SORT_BY Sort list results. Default is 'latestDeadline'. Valid options are 'grouped', 'prize', 'earliestDeadline', 'latestDeadline', 'numberOfTeams', and 'recentlyCreated' 排序

-p PAGE, --page PAGE Page number for results paging. Page size is 20 by default 页面

-s SEARCH, --search SEARCH 搜索
Term(s) to search for

-v, --csv Print results in CSV format csv格式
(if not set print in table format)

展示比赛数据

1
2
3
4
5
6
7
8
usage: kaggle competitions files [-h] [-v] [-q] [competition]

optional arguments:
-h, --help show this help message and exit
competition Competition URL suffix (use "kaggle competitions list" to show options)
If empty, the default competition will be used (use "kaggle config set competition")"
-v, --csv Print results in CSV format (if not set print in table format)
-q, --quiet Suppress printing information about the upload/download progress 安静模式

下载比赛数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
usage: kaggle competitions download [-h] [-f FILE_NAME] [-p PATH] [-w] [-o]
[-q]
[competition]

optional arguments:
-h, --help show this help message and exit
competition Competition URL suffix (use "kaggle competitions list" to show options)
If empty, the default competition will be used (use "kaggle config set competition")"
-f FILE_NAME, --file FILE_NAME
File name, all files downloaded if not provided
(use "kaggle competitions files -c <competition>" to show options)
-p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory
-w, --wp Download files to current working path
-o, --force Skip check whether local version of file is up to date, force file download
-q, --quiet Suppress printing information about the upload/download progress

提交结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
usage: kaggle competitions submit [-h] -f FILE_NAME -m MESSAGE [-q]
[competition]

required arguments:
-f FILE_NAME, --file FILE_NAME
File for upload (full path)
-m MESSAGE, --message MESSAGE
Message describing this submission

optional arguments:
-h, --help show this help message and exit
competition Competition URL suffix (use "kaggle competitions list" to show options)
If empty, the default competition will be used (use "kaggle config set competition")"
-q, --quiet Suppress printing information about the upload/download progress

列出参赛作品

1
2
3
4
5
6
7
8
usage: kaggle competitions submissions [-h] [-v] [-q] [competition]

optional arguments:
-h, --help show this help message and exit
competition Competition URL suffix (use "kaggle competitions list" to show options)
If empty, the default competition will be used (use "kaggle config set competition")"
-v, --csv Print results in CSV format (if not set print in table format)
-q, --quiet Suppress printing information about the upload/download progress

获取比赛 leaderboard

1
2
3
4
5
6
7
8
9
10
11
12
usage: kaggle competitions leaderboard [-h] [-s] [-d] [-p PATH] [-v] [-q]
[competition]

optional arguments:
-h, --help show this help message and exit
competition Competition URL suffix (use "kaggle competitions list" to show options)
If empty, the default competition will be used (use "kaggle config set competition")"
-s, --show Show the top of the leaderboard
-d, --download Download entire leaderboard
-p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory
-v, --csv Print results in CSV format (if not set print in table format)
-q, --quiet Suppress printing information about the upload/download progress

数据集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
usage: kaggle datasets [-h]
{list,files,download,create,version,init,metadata,status} ...

optional arguments:
-h, --help show this help message and exit

commands:
{list,files,download,create,version,init,metadata, status}
list List available datasets 列出可用数据集
files List dataset files 文件列表数据集文件
download Download dataset files 下载数据集文件
create Create a new dataset 创建新数据集
version Create a new dataset version 创建新的数据集版本
init Initialize metadata file for dataset creation 初始化用于创建数据集的元数据文件
metadata Download metadata about a dataset 下载有关数据集的元数据
status Get the creation status for a dataset 获取数据集的创建状态

列出数据集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
usage: kaggle datasets list [-h] [--sort-by SORT_BY] [--size SIZE] [--file-type FILE_TYPE] [--license LICENSE_NAME] [--tags TaG_IDS] [-s SEARCH] [-m] [--user USER] [-p PAGE] [-v]

optional arguments:
-h, --help show this help message and exit
--sort-by SORT_BY Sort list results. Default is 'hottest'. Valid options are 'hottest', 'votes', 'updated', and 'active'
--size SIZE Search for datasets of a specific size. Default is 'all'. Valid options are 'all', 'small', 'medium', and 'large'
--file-type FILE_TYPE Search for datasets with a specific file type. Default is 'all'. Valid options are 'all', 'csv', 'sqlite', 'json', and 'bigQuery'. Please note that bigQuery datasets cannot be downloaded
--license LICENSE_NAME
Search for datasets with a specific license. Default is 'all'. Valid options are 'all', 'cc', 'gpl', 'odb', and 'other'
--tags TAG_IDS Search for datasets that have specific tags. Tag list should be comma separated
-s SEARCH, --search SEARCH
Term(s) to search for
-m, --mine Display only my items
--user USER Find public datasets owned by a specific user or organization
-p PAGE, --page PAGE Page number for results paging. Page size is 20 by default
-v, --csv Print results in CSV format (if not set print in table format)

列出数据集文件

1
2
3
4
5
6
usage: kaggle datasets files [-h] [-v] [dataset]

optional arguments:
-h, --help show this help message and exit
dataset Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options)
-v, --csv Print results in CSV format (if not set print in table format)

下载数据集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
usage: kaggle datasets download [-h] [-f FILE_NAME] [-p PATH] [-w] [--unzip]
[-o] [-q]
[dataset]

optional arguments:
-h, --help show this help message and exit
dataset Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options)
-f FILE_NAME, --file FILE_NAME
File name, all files downloaded if not provided
(use "kaggle datasets files -d <dataset>" to show options)
-p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory
-w, --wp Download files to current working path
--unzip Unzip the downloaded file. Will delete the zip file when completed.
-o, --force Skip check whether local version of file is up to date, force file download
-q, --quiet Suppress printing information about the upload/download progress