Pandas
is a library for working with tabular
data in Python.dataframes
.NumPy
is a Python library for working with datasets using NumPy arrays
of varying dimensions.lists
but faster and more powerful.
Output:
5
5
DataFrame
.loc
(location) method.dataframe_name['column name']
to select a column.==
and the value.loc
argument, specify the column to update.iloc
(integer location) to select rows and columns by position.dataframe_name.iloc[row_number, column_number]
.[]
operator to filter rows.
Output:
Patient_ID Age Cholesterol Glucose Level
count 6.000000 6.000000 6.000000 6.000000
mean 3.333333 56.666667 208.333333 107.166667
std 1.632993 9.309493 20.412415 22.003788
min 1.000000 45.000000 180.000000 90.000000
25% 2.250000 51.250000 200.000000 95.750000
50% 3.500000 55.000000 205.000000 99.000000
75% 4.750000 62.500000 217.500000 107.500000
max 5.000000 70.000000 240.000000 150.000000
pandas_split.py
patient_data_2['Diagnosis'] == 'Hypertension'
finds all rows where the diagnosis is hypertension.['Cholesterol']
returns only the cholesterol values for the rows that are filtered by the above query.shape
attribute to find the shape of an array.
Output:
(2, 3)
Output:
[10 2 2 4 1 1 7]
stats.ttest_ind
.scipy_stats.py
Output:
t-score: 2.5
p-value: 0.05
Note:
lesson_2.ipynb
.