Aggregations#
When performing aggregations such as sum, mean and count, Daft enables you to group data by certain keys and aggregate within those keys.
Calling df.groupby()
returns a GroupedDataFrame
object which is a view of the original DataFrame but with additional context on which keys to group on. You can then call various aggregation methods to run the aggregation within each group, returning a new DataFrame.
Learn more about Aggregations and Grouping in Daft User Guide.
GroupedDataFrame #
GroupedDataFrame(
df: DataFrame, group_by: ExpressionsProjection
)
Methods:
Name | Description |
---|---|
agg | Perform aggregations on this GroupedDataFrame. Allows for mixed aggregations. |
agg_concat | Performs grouped concat on this GroupedDataFrame. |
agg_list | Performs grouped list on this GroupedDataFrame. |
agg_set | Performs grouped set on this GroupedDataFrame (ignoring nulls). |
any_value | Returns an arbitrary value on this GroupedDataFrame. |
count | Performs grouped count on this GroupedDataFrame. |
map_groups | Apply a user-defined function to each group. The name of the resultant column will default to the name of the first input column. |
max | Performs grouped max on this GroupedDataFrame. |
mean | Performs grouped mean on this GroupedDataFrame. |
min | Perform grouped min on this GroupedDataFrame. |
skew | Performs grouped skew on this GroupedDataFrame. |
stddev | Performs grouped standard deviation on this GroupedDataFrame. |
sum | Perform grouped sum on this GroupedDataFrame. |
Attributes:
Name | Type | Description |
---|---|---|
df | DataFrame | |
group_by | ExpressionsProjection | |
agg #
agg(
*to_agg: Union[Expression, Iterable[Expression]],
) -> DataFrame
Perform aggregations on this GroupedDataFrame. Allows for mixed aggregations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*to_agg | Union[Expression, Iterable[Expression]] | aggregation expressions | () |
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with grouped aggregations |
Examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Source code in daft/dataframe/dataframe.py
3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 |
|
agg_concat #
agg_concat(*cols: ColumnInputType) -> DataFrame
Performs grouped concat on this GroupedDataFrame.
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with grouped concatenated list per column. |
Source code in daft/dataframe/dataframe.py
3699 3700 3701 3702 3703 3704 3705 |
|
agg_list #
agg_list(*cols: ColumnInputType) -> DataFrame
Performs grouped list on this GroupedDataFrame.
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with grouped list per column. |
Source code in daft/dataframe/dataframe.py
3680 3681 3682 3683 3684 3685 3686 |
|
agg_set #
agg_set(*cols: ColumnInputType) -> DataFrame
Performs grouped set on this GroupedDataFrame (ignoring nulls).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*cols | Union[str, Expression] | columns to form into a set | () |
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with grouped set per column. |
Source code in daft/dataframe/dataframe.py
3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 |
|
any_value #
any_value(*cols: ColumnInputType) -> DataFrame
Returns an arbitrary value on this GroupedDataFrame.
Values for each column are not guaranteed to be from the same row.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*cols | Union[str, Expression] | columns to get | () |
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with any values. |
Source code in daft/dataframe/dataframe.py
3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 |
|
count #
count(*cols: ColumnInputType) -> DataFrame
Performs grouped count on this GroupedDataFrame.
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with grouped count per column. |
Source code in daft/dataframe/dataframe.py
3664 3665 3666 3667 3668 3669 3670 |
|
map_groups #
map_groups(udf: Expression) -> DataFrame
Apply a user-defined function to each group. The name of the resultant column will default to the name of the first input column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
udf | Expression | User-defined function to apply to each group. | required |
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with grouped aggregations |
Examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Source code in daft/dataframe/dataframe.py
3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 |
|
max #
max(*cols: ColumnInputType) -> DataFrame
Performs grouped max on this GroupedDataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*cols | Union[str, Expression] | columns to max | () |
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with grouped max. |
Source code in daft/dataframe/dataframe.py
3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 |
|
mean #
mean(*cols: ColumnInputType) -> DataFrame
Performs grouped mean on this GroupedDataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*cols | Union[str, Expression] | columns to mean | () |
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with grouped mean. |
Source code in daft/dataframe/dataframe.py
3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 |
|
min #
min(*cols: ColumnInputType) -> DataFrame
Perform grouped min on this GroupedDataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*cols | Union[str, Expression] | columns to min | () |
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with grouped min. |
Source code in daft/dataframe/dataframe.py
3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 |
|
skew #
skew(*cols: ColumnInputType) -> DataFrame
Performs grouped skew on this GroupedDataFrame.
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with the grouped skew per column. |
Source code in daft/dataframe/dataframe.py
3672 3673 3674 3675 3676 3677 3678 |
|
stddev #
stddev(*cols: ColumnInputType) -> DataFrame
Performs grouped standard deviation on this GroupedDataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*cols | Union[str, Expression] | columns to stddev | () |
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with grouped standard deviation. |
Examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Source code in daft/dataframe/dataframe.py
3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 |
|
sum #
sum(*cols: ColumnInputType) -> DataFrame
Perform grouped sum on this GroupedDataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*cols | Union[str, Expression] | columns to sum | () |
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | DataFrame with grouped sums. |
Source code in daft/dataframe/dataframe.py
3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 |
|