Run xgboost on Mac and Regression data

The source code: xgboost_regression

update at 2021-09-07:

Install xgboost on Apple M1

1
2
3
4
5
6
7
git clone --recursive https://github.com/dmlc/xgboost
mkdir xgboost/my_build
cd xgboost/my_build
CC=gcc-11 CXX=g++-11 cmake ..
make -j4
cd ../python_package
/Users/xx/miniforge3/envs/tf/bin/python setup.py install

u must install miniforge for M1, conda create -n tf python=3.9.5

Run xgboost

In the process of using xgboost, I encountered a small obstacle, that is, xgboost cannot be run normally on the M1 of the Mac. It needs to be tossed. The following is the installation process:

1. Homebrrew is required first

2. Install gcc and cmake

1
2
3
brew install gcc
brew install cmake
brew install libomp

3. Download xgboost package

Yes, you cannot use the network package to install, you need to download, compile and install by yourself. Fortunately, the process is not troublesome:

Source: http://mirrors.aliyun.com/pypi/simple/xgboost/

I downloaded xgboost-1.4.2.tar.gz

4. Installation

Enter cd ~/download/

run

1
pip install xgboost-1.4.2.tar.gz

Okay, you can introduce it to try

1
2
from xgboost import XGBClassifier
xb = XGBClassifier()

xgboost Regression

load data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import pandas as pd
import warnings
%pylab inline

warnings.filterwarnings('ignore')

# load data from url
df = pd.read_csv('./data/Titanic.txt', sep=',', quotechar='"', encoding='ISO 8859-15')
df.info()
df.head()

# Filter some features
features = df[['pclass', 'age', 'sex']]
# Label
label = df['survived']
features.info()

# Missing values ​​are filled with mean
features['age'].fillna(df['age'].mean(), inplace=True)
features.info()

# Divide the dataset
from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(features, label, test_size = 0.25, random_state=33)

# Feature vectorization
from sklearn.feature_extraction import DictVectorizer

vec = DictVectorizer(sparse = False)
train_x = vec.fit_transform(train_x.to_dict(orient='record'))
test_x = vec.transform(test_x.to_dict(orient='record'))

# Random forest training and prediction
from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier()
rfc.fit(train_x, train_y)
print('The accuracy of random Forest Classifier on testing set:', rfc.score(test_x, test_y))

"""
The accuracy of random Forest Classifier on testing set: 0.7781155015197568
"""


# xgboost training and prediction
from xgboost import XGBClassifier
xb = XGBClassifier()
xb.fit(train_x, train_y)
print(f'The accuracy:', xb.score(test_x, test_y))

"""
The accuracy: 0.7750759878419453
"""

Run xgboost on Mac and Regression data

https://hivan.me/run_xgboost_on_M1_and_regression/

作者

Hivan Du

发布于

2021-09-09

更新于

2023-06-02

许可协议

评论