Normal unpickling creates a situation where __new__ receives all
9 elements rather than the 6 that are required for the
constructor. This method ensures that only the 6 are provided.
Apply the transform using matrix multiplication, creating
a resulting object of the same type. A transform may be applied
to another transform, a vector, vector array, or shape.
Parameters:
other (Affine, Vec2,
Vec2Array, Shape) – The object to transform.
Return type:
Same as other
static__new__(cls, a, b, c, d, e, f, g=0.0, h=0.0, i=1.0)[source]
Create a new object
Parameters:
a (float) – Elements of an augmented affine transformation matrix.
b (float) – Elements of an augmented affine transformation matrix.
c (float) – Elements of an augmented affine transformation matrix.
d (float) – Elements of an augmented affine transformation matrix.
e (float) – Elements of an augmented affine transformation matrix.
f (float) – Elements of an augmented affine transformation matrix.
i.e., if angles between points are preserved after applying the
transform, within rounding limits. This implies that the
transform has no effective shear.
Which means that the transform represents a rigid motion, which
has no effective scaling or shear. Mathematically, this means
that the axis vectors of the transform matrix are perpendicular
and unit-length. Applying an orthonormal transform to a shape
always results in a congruent shape.
Create a scaling transform from a scalar or vector.
Parameters:
scaling (float or sequence) – The scaling factor. A scalar value will
scale in both dimensions equally. A vector scaling
value scales the dimensions independently.
A geometry type composed of one or more line segments.
A LineString is a one-dimensional feature and has a non-zero length but
zero area. It may approximate a curve and need not be straight. A LineString may
be closed.
Parameters:
coordinates (sequence) – A sequence of (x, y, [,z]) numeric coordinate pairs or triples, or
an array-like with shape (N, 2) or (N, 3).
Also can be a sequence of Point objects, or combination of both.
Get a geometry that represents all points within a distance of this geometry.
A positive distance produces a dilation, a negative distance an
erosion. A very small or zero distance may sometimes be used to
“tidy” a polygon.
Parameters:
distance (float) – The distance to buffer around the object.
quad_segs (int, optional) – Sets the number of line segments used to approximate an
angle fillet.
cap_style (shapely.BufferCapStyle or {'round','square','flat'}, default'round') – Specifies the shape of buffered line endings. BufferCapStyle.round
(‘round’) results in circular line endings (see quad_segs). Both
BufferCapStyle.square (‘square’) and BufferCapStyle.flat (‘flat’)
result in rectangular line endings, only BufferCapStyle.flat
(‘flat’) will end at the original vertex, while
BufferCapStyle.square (‘square’) involves adding the buffer width.
join_style (shapely.BufferJoinStyle or {'round','mitre','bevel'}, default'round') – Specifies the shape of buffered line midpoints.
BufferJoinStyle.ROUND (‘round’) results in rounded shapes.
BufferJoinStyle.bevel (‘bevel’) results in a beveled edge that
touches the original vertex. BufferJoinStyle.mitre (‘mitre’) results
in a single vertex that is beveled depending on the mitre_limit
parameter.
mitre_limit (float, optional) – The mitre limit ratio is used for very sharp corners. The
mitre ratio is the ratio of the distance from the corner to
the end of the mitred offset corner. When two line segments
meet at a sharp angle, a miter join will extend the original
geometry. To prevent unreasonable geometry, the mitre limit
allows controlling the maximum length of the join corner.
Corners with a ratio which exceed the limit will be beveled.
The side used is determined by the sign of the buffer
distance:
a positive distance indicates the left-hand side
a negative distance indicates the right-hand side
The single-sided buffer of point geometries is the same as
the regular buffer. The End Cap Style for single-sided
buffers is always ignored, and forced to the equivalent of
CAP_FLAT.
quadsegs (int, optional) – Deprecated aliases for quad_segs.
resolution (int, optional) – Deprecated aliases for quad_segs.
**kwargs (dict, optional) – For backwards compatibility of renamed parameters. If an unsupported
kwarg is passed, a ValueError will be raised.
Return type:
Geometry
Notes
The return value is a strictly two-dimensional geometry. All
Z coordinates of the original geometry will be ignored.
Deprecated since version 2.1.0: A deprecation warning is shown if quad_segs, cap_style,
join_style, mitre_limit or single_sided are
specified as positional arguments. In a future release, these will
need to be specified as keyword arguments.
Return a point at the specified distance along a linear geometry.
Negative length values are taken as measured in the reverse
direction from the end of the geometry. Out-of-range index
values are handled by clamping them to the valid range of values.
If the normalized arg is True, the distance will be interpreted as a
fraction of the geometry’s length.
Return a point at the specified distance along a linear geometry.
Negative length values are taken as measured in the reverse
direction from the end of the geometry. Out-of-range index
values are handled by clamping them to the valid range of values.
If the normalized arg is True, the distance will be interpreted as a
fraction of the geometry’s length.
Return the oriented envelope (minimum rotated rectangle) of the geometry.
The oriented envelope encloses an input geometry, such that the resulting
rectangle has minimum area.
Unlike envelope this rectangle is not constrained to be parallel to the
coordinate axes. If the convex hull of the object is a degenerate (line
or point) this degenerate is returned.
The starting point of the rectangle is not fixed. You can use
normalize() to reorganize the rectangle to
strict canonical form so the starting point is
always the lower left point.
Convert geometry to normal form (or canonical form).
This method orders the coordinates, rings of a polygon and parts of
multi geometries consistently. Typically useful for testing purposes
(for example in combination with equals_exact).
Return a (Multi)LineString at a distance from the object.
The side, left or right, is determined by the sign of the distance
parameter (negative for right side offset, positive for left side
offset). The resolution of the buffer around each vertex of the object
increases by increasing the quad_segs keyword parameter.
The join style is for outside corners between line segments. Accepted
values are JOIN_STYLE.round (1), JOIN_STYLE.mitre (2), and
JOIN_STYLE.bevel (3).
The mitre ratio limit is used for very sharp corners. It is the ratio
of the distance from the corner to the end of the mitred offset corner.
When two line segments meet at a sharp angle, a miter join will extend
far beyond the original geometry. To prevent unreasonable geometry, the
mitre limit allows controlling the maximum length of the join corner.
Corners with a ratio which exceed the limit will be beveled.
Note: the behaviour regarding orientation of the resulting line
depends on the GEOS version. With GEOS < 3.11, the line retains the
same direction for a left offset (positive distance) or has reverse
direction for a right offset (negative distance), and this behaviour
was documented as such in previous Shapely versions. Starting with
GEOS 3.11, the function tries to preserve the orientation of the
original line.
Return the oriented envelope (minimum rotated rectangle) of a geometry.
The oriented envelope encloses an input geometry, such that the resulting
rectangle has minimum area.
Unlike envelope this rectangle is not constrained to be parallel to the
coordinate axes. If the convex hull of the object is a degenerate (line
or point) this degenerate is returned.
The starting point of the rectangle is not fixed. You can use
normalize() to reorganize the rectangle to
strict canonical form so the starting point is
always the lower left point.
Older alternative method to the offset_curve() method, but uses
resolution instead of quad_segs and a side keyword
(‘left’ or ‘right’) instead of sign of the distance. This method is
kept for backwards compatibility for now, but is is recommended to
use offset_curve() instead.
Add vertices to line segments based on maximum segment length.
Additional vertices will be added to every line segment in an input geometry
so that segments are no longer than the provided maximum segment length. New
vertices will evenly subdivide each segment.
Only linear components of input geometries are densified; other geometries
are returned unmodified.
Parameters:
max_segment_length (float or array_like) – Additional vertices will be added so that all line segments are no
longer this value. Must be greater than 0.
Return a simplified geometry produced by the Douglas-Peucker algorithm.
Coordinates of the simplified geometry will be no more than the
tolerance distance from the original. Unless the topology preserving
option is used, the algorithm may produce self-intersecting or
otherwise invalid geometries.
points (sequence) – A sequence of Points, or a sequence of (x, y [,z]) numeric coordinate
pairs or triples, or an array-like of shape (N, 2) or (N, 3).
Get a geometry that represents all points within a distance of this geometry.
A positive distance produces a dilation, a negative distance an
erosion. A very small or zero distance may sometimes be used to
“tidy” a polygon.
Parameters:
distance (float) – The distance to buffer around the object.
quad_segs (int, optional) – Sets the number of line segments used to approximate an
angle fillet.
cap_style (shapely.BufferCapStyle or {'round','square','flat'}, default'round') – Specifies the shape of buffered line endings. BufferCapStyle.round
(‘round’) results in circular line endings (see quad_segs). Both
BufferCapStyle.square (‘square’) and BufferCapStyle.flat (‘flat’)
result in rectangular line endings, only BufferCapStyle.flat
(‘flat’) will end at the original vertex, while
BufferCapStyle.square (‘square’) involves adding the buffer width.
join_style (shapely.BufferJoinStyle or {'round','mitre','bevel'}, default'round') – Specifies the shape of buffered line midpoints.
BufferJoinStyle.ROUND (‘round’) results in rounded shapes.
BufferJoinStyle.bevel (‘bevel’) results in a beveled edge that
touches the original vertex. BufferJoinStyle.mitre (‘mitre’) results
in a single vertex that is beveled depending on the mitre_limit
parameter.
mitre_limit (float, optional) – The mitre limit ratio is used for very sharp corners. The
mitre ratio is the ratio of the distance from the corner to
the end of the mitred offset corner. When two line segments
meet at a sharp angle, a miter join will extend the original
geometry. To prevent unreasonable geometry, the mitre limit
allows controlling the maximum length of the join corner.
Corners with a ratio which exceed the limit will be beveled.
The side used is determined by the sign of the buffer
distance:
a positive distance indicates the left-hand side
a negative distance indicates the right-hand side
The single-sided buffer of point geometries is the same as
the regular buffer. The End Cap Style for single-sided
buffers is always ignored, and forced to the equivalent of
CAP_FLAT.
quadsegs (int, optional) – Deprecated aliases for quad_segs.
resolution (int, optional) – Deprecated aliases for quad_segs.
**kwargs (dict, optional) – For backwards compatibility of renamed parameters. If an unsupported
kwarg is passed, a ValueError will be raised.
Return type:
Geometry
Notes
The return value is a strictly two-dimensional geometry. All
Z coordinates of the original geometry will be ignored.
Deprecated since version 2.1.0: A deprecation warning is shown if quad_segs, cap_style,
join_style, mitre_limit or single_sided are
specified as positional arguments. In a future release, these will
need to be specified as keyword arguments.
Return a point at the specified distance along a linear geometry.
Negative length values are taken as measured in the reverse
direction from the end of the geometry. Out-of-range index
values are handled by clamping them to the valid range of values.
If the normalized arg is True, the distance will be interpreted as a
fraction of the geometry’s length.
Return a point at the specified distance along a linear geometry.
Negative length values are taken as measured in the reverse
direction from the end of the geometry. Out-of-range index
values are handled by clamping them to the valid range of values.
If the normalized arg is True, the distance will be interpreted as a
fraction of the geometry’s length.
Return the oriented envelope (minimum rotated rectangle) of the geometry.
The oriented envelope encloses an input geometry, such that the resulting
rectangle has minimum area.
Unlike envelope this rectangle is not constrained to be parallel to the
coordinate axes. If the convex hull of the object is a degenerate (line
or point) this degenerate is returned.
The starting point of the rectangle is not fixed. You can use
normalize() to reorganize the rectangle to
strict canonical form so the starting point is
always the lower left point.
Convert geometry to normal form (or canonical form).
This method orders the coordinates, rings of a polygon and parts of
multi geometries consistently. Typically useful for testing purposes
(for example in combination with equals_exact).
Return the oriented envelope (minimum rotated rectangle) of a geometry.
The oriented envelope encloses an input geometry, such that the resulting
rectangle has minimum area.
Unlike envelope this rectangle is not constrained to be parallel to the
coordinate axes. If the convex hull of the object is a degenerate (line
or point) this degenerate is returned.
The starting point of the rectangle is not fixed. You can use
normalize() to reorganize the rectangle to
strict canonical form so the starting point is
always the lower left point.
Add vertices to line segments based on maximum segment length.
Additional vertices will be added to every line segment in an input geometry
so that segments are no longer than the provided maximum segment length. New
vertices will evenly subdivide each segment.
Only linear components of input geometries are densified; other geometries
are returned unmodified.
Parameters:
max_segment_length (float or array_like) – Additional vertices will be added so that all line segments are no
longer this value. Must be greater than 0.
Return a simplified geometry produced by the Douglas-Peucker algorithm.
Coordinates of the simplified geometry will be no more than the
tolerance distance from the original. Unless the topology preserving
option is used, the algorithm may produce self-intersecting or
otherwise invalid geometries.
A geometry type representing an area that is enclosed by a linear ring.
A polygon is a two-dimensional feature and has a non-zero area. It may
have one or more negative-space “holes” which are also bounded by linear
rings. If any rings cross each other, the feature is invalid and
operations on it may fail.
Parameters:
shell (sequence) – A sequence of (x, y [,z]) numeric coordinate pairs or triples, or
an array-like with shape (N, 2) or (N, 3).
Also can be a sequence of Point objects.
holes (sequence) – A sequence of objects which satisfy the same requirements as the
shell parameters above
Get a geometry that represents all points within a distance of this geometry.
A positive distance produces a dilation, a negative distance an
erosion. A very small or zero distance may sometimes be used to
“tidy” a polygon.
Parameters:
distance (float) – The distance to buffer around the object.
quad_segs (int, optional) – Sets the number of line segments used to approximate an
angle fillet.
cap_style (shapely.BufferCapStyle or {'round','square','flat'}, default'round') – Specifies the shape of buffered line endings. BufferCapStyle.round
(‘round’) results in circular line endings (see quad_segs). Both
BufferCapStyle.square (‘square’) and BufferCapStyle.flat (‘flat’)
result in rectangular line endings, only BufferCapStyle.flat
(‘flat’) will end at the original vertex, while
BufferCapStyle.square (‘square’) involves adding the buffer width.
join_style (shapely.BufferJoinStyle or {'round','mitre','bevel'}, default'round') – Specifies the shape of buffered line midpoints.
BufferJoinStyle.ROUND (‘round’) results in rounded shapes.
BufferJoinStyle.bevel (‘bevel’) results in a beveled edge that
touches the original vertex. BufferJoinStyle.mitre (‘mitre’) results
in a single vertex that is beveled depending on the mitre_limit
parameter.
mitre_limit (float, optional) – The mitre limit ratio is used for very sharp corners. The
mitre ratio is the ratio of the distance from the corner to
the end of the mitred offset corner. When two line segments
meet at a sharp angle, a miter join will extend the original
geometry. To prevent unreasonable geometry, the mitre limit
allows controlling the maximum length of the join corner.
Corners with a ratio which exceed the limit will be beveled.
The side used is determined by the sign of the buffer
distance:
a positive distance indicates the left-hand side
a negative distance indicates the right-hand side
The single-sided buffer of point geometries is the same as
the regular buffer. The End Cap Style for single-sided
buffers is always ignored, and forced to the equivalent of
CAP_FLAT.
quadsegs (int, optional) – Deprecated aliases for quad_segs.
resolution (int, optional) – Deprecated aliases for quad_segs.
**kwargs (dict, optional) – For backwards compatibility of renamed parameters. If an unsupported
kwarg is passed, a ValueError will be raised.
Return type:
Geometry
Notes
The return value is a strictly two-dimensional geometry. All
Z coordinates of the original geometry will be ignored.
Deprecated since version 2.1.0: A deprecation warning is shown if quad_segs, cap_style,
join_style, mitre_limit or single_sided are
specified as positional arguments. In a future release, these will
need to be specified as keyword arguments.
Return a point at the specified distance along a linear geometry.
Negative length values are taken as measured in the reverse
direction from the end of the geometry. Out-of-range index
values are handled by clamping them to the valid range of values.
If the normalized arg is True, the distance will be interpreted as a
fraction of the geometry’s length.
Return a point at the specified distance along a linear geometry.
Negative length values are taken as measured in the reverse
direction from the end of the geometry. Out-of-range index
values are handled by clamping them to the valid range of values.
If the normalized arg is True, the distance will be interpreted as a
fraction of the geometry’s length.
Return the oriented envelope (minimum rotated rectangle) of the geometry.
The oriented envelope encloses an input geometry, such that the resulting
rectangle has minimum area.
Unlike envelope this rectangle is not constrained to be parallel to the
coordinate axes. If the convex hull of the object is a degenerate (line
or point) this degenerate is returned.
The starting point of the rectangle is not fixed. You can use
normalize() to reorganize the rectangle to
strict canonical form so the starting point is
always the lower left point.
Convert geometry to normal form (or canonical form).
This method orders the coordinates, rings of a polygon and parts of
multi geometries consistently. Typically useful for testing purposes
(for example in combination with equals_exact).
Return the oriented envelope (minimum rotated rectangle) of a geometry.
The oriented envelope encloses an input geometry, such that the resulting
rectangle has minimum area.
Unlike envelope this rectangle is not constrained to be parallel to the
coordinate axes. If the convex hull of the object is a degenerate (line
or point) this degenerate is returned.
The starting point of the rectangle is not fixed. You can use
normalize() to reorganize the rectangle to
strict canonical form so the starting point is
always the lower left point.
Add vertices to line segments based on maximum segment length.
Additional vertices will be added to every line segment in an input geometry
so that segments are no longer than the provided maximum segment length. New
vertices will evenly subdivide each segment.
Only linear components of input geometries are densified; other geometries
are returned unmodified.
Parameters:
max_segment_length (float or array_like) – Additional vertices will be added so that all line segments are no
longer this value. Must be greater than 0.
Return a simplified geometry produced by the Douglas-Peucker algorithm.
Coordinates of the simplified geometry will be no more than the
tolerance distance from the original. Unless the topology preserving
option is used, the algorithm may produce self-intersecting or
otherwise invalid geometries.
Class for efficiently working with raster data while preserving
geographic transformation information. Can be initialized with either a file path
or directly with raster data, CRS, and transform.
Initialize a RasterHandler for working with raster data and coordinate
transformations.
Creates a window and buffer geometry based on source and target coordinates:
- If source and target are single coordinates: creates a line buffer
- If source and/or target are lists of coordinates: creates a polygon buffer
Parameters:
raster_source (RasterDataset) – Either:
- Path to the raster file (str), or
- Tuple of (data_array, crs, transform)
Make a Transformer from a pyproj.crs.CRS or input used to create one.
See:
proj_create_crs_to_crs()
proj_create_crs_to_crs_from_pj()
Added in version 2.2.0: always_xy
Added in version 2.3.0: area_of_interest
Added in version 3.1.0: authority, accuracy, allow_ballpark
Added in version 3.4.0: force_over
Added in version 3.5.0: only_best
Parameters:
crs_from (pyproj.crs.CRS or inputused to createone) – Projection of input data.
crs_to (pyproj.crs.CRS or inputused to createone) – Projection of output data.
always_xy (bool, defaultFalse) – If true, the transform method will accept as input and return as output
coordinates using the traditional GIS order, that is longitude, latitude
for geographic CRS and easting, northing for most projected CRS.
area_of_interest (AreaOfInterest, optional) – The area of interest to help select the transformation.
authority (str, optional) – When not specified, coordinate operations from any authority will be
searched, with the restrictions set in the
authority_to_authority_preference database table related to the
authority of the source/target CRS themselves. If authority is set
to “any”, then coordinate operations from any authority will be
searched. If authority is a non-empty string different from “any”,
then coordinate operations will be searched only in that authority
namespace (e.g. EPSG).
accuracy (float, optional) – The minimum desired accuracy (in metres) of the candidate
coordinate operations.
allow_ballpark (bool, optional) – Set to False to disallow the use of Ballpark transformation
in the candidate coordinate operations. Default is to allow.
force_over (bool, defaultFalse) – If True, it will to force the +over flag on the transformation.
Requires PROJ 9+.
only_best (bool, optional) – Can be set to True to cause PROJ to error out if the best
transformation known to PROJ and usable by PROJ if all grids known and
usable by PROJ were accessible, cannot be used. Best transformation should
be understood as the transformation returned by
proj_get_suggested_operation() if all known grids were
accessible (either locally or through network).
Note that the default value for this option can be also set with the
PROJ_ONLY_BEST_DEFAULT environment variable, or with the
only_best_default setting of proj-ini.
The only_best kwarg overrides the default value if set.
Requires PROJ 9.2+.
Added in version 3.1.0: AUTH:CODE string support (e.g. EPSG:1671)
Allowed input:
a PROJ string
a WKT string
a PROJJSON string
an object code (e.g. “EPSG:1671”
“urn:ogc:def:coordinateOperation:EPSG::1671”)
an object name. e.g “ITRF2014 to ETRF2014 (1)”.
In that case as uniqueness is not guaranteed,
heuristics are applied to determine the appropriate best match.
Make a Transformer from a pyproj.Proj or input used to create one.
Deprecated since version 3.4.1: from_crs() is preferred.
Added in version 2.2.0: always_xy
Added in version 2.3.0: area_of_interest
Parameters:
proj_from (pyproj.Proj or inputused to createone) – Projection of input data.
proj_to (pyproj.Proj or inputused to createone) – Projection of output data.
always_xy (bool, defaultFalse) – If true, the transform method will accept as input and return as output
coordinates using the traditional GIS order, that is longitude, latitude
for geographic CRS and easting, northing for most projected CRS.
area_of_interest (AreaOfInterest, optional) – The area of interest to help select the transformation.
switch (bool, defaultFalse) – If True x, y or lon,lat coordinates of points are switched to y, x
or lat, lon. Default is False.
time_3rd (bool, defaultFalse) – If the input coordinates are 3 dimensional and the 3rd dimension is time.
radians (bool, defaultFalse) – If True, will expect input data to be in radians and will return radians
if the projection is geographic. Otherwise, it uses degrees.
Ignored for pipeline transformations with pyproj 2,
but will work in pyproj 3.
errcheck (bool, defaultFalse) – If True, an exception is raised if the errors are found in the process.
If False, inf is returned for errors.
direction (pyproj.enums.TransformDirection, optional) – The direction of the transform.
Default is pyproj.enums.TransformDirection.FORWARD.
zz (scalar or array, optional) – Input z coordinate(s).
tt (scalar or array, optional) – Input time coordinate(s).
radians (bool, defaultFalse) – If True, will expect input data to be in radians and will return radians
if the projection is geographic. Otherwise, it uses degrees.
Ignored for pipeline transformations with pyproj 2,
but will work in pyproj 3.
errcheck (bool, defaultFalse) – If True, an exception is raised if the errors are found in the process.
If False, inf is returned for errors.
direction (pyproj.enums.TransformDirection, optional) – The direction of the transform.
Default is pyproj.enums.TransformDirection.FORWARD.
inplace (bool, defaultFalse) – If True, will attempt to write the results to the input array
instead of returning a new array. This will fail if the input
is not an array in C order with the double data type.
Transform boundary densifying the edges to account for nonlinear
transformations along these edges and extracting the outermost bounds.
If the destination CRS is geographic and right < left then the bounds
crossed the antimeridian. In this scenario there are two polygons,
one on each side of the antimeridian. The first polygon should be
constructed with (left, bottom, 180, top) and the second with
(-180, bottom, top, right).
left (float) – Minimum bounding coordinate of the first axis in source CRS
(or the target CRS if using the reverse direction).
bottom (float) – Minimum bounding coordinate of the second axis in source CRS.
(or the target CRS if using the reverse direction).
right (float) – Maximum bounding coordinate of the first axis in source CRS.
(or the target CRS if using the reverse direction).
top (float) – Maximum bounding coordinate of the second axis in source CRS.
(or the target CRS if using the reverse direction).
densify_points (uint, default21) – Number of points to add to each edge to account for nonlinear edges
produced by the transform process. Large numbers will produce worse
performance.
radians (bool, defaultFalse) – If True, will expect input data to be in radians and will return radians
if the projection is geographic. Otherwise, it uses degrees.
errcheck (bool, defaultFalse) – If True, an exception is raised if the errors are found in the process.
If False, inf is returned for errors.
direction (pyproj.enums.TransformDirection, optional) – The direction of the transform.
Default is pyproj.enums.TransformDirection.FORWARD.
Previously the lengths were called ‘num_cols’ and ‘num_rows’ but
this is a bit confusing in the new float precision world and the
attributes have been changed. The originals are deprecated.
Construct a Window from row and column slices or tuples / lists of
start and stop indexes. Converts the rows and cols to offsets, height,
and width.
In general, indexes are defined relative to the upper left corner of
the dataset: rows=(0, 10), cols=(0, 4) defines a window that is 4
columns wide and 10 rows high starting from the upper left.
Start indexes may be None and will default to 0.
Stop indexes may be None and will default to width or height, which
must be provided in this case.
Negative start indexes are evaluated relative to the lower right of the
dataset: rows=(-2, None), cols=(-2, None) defines a window that is 2
rows high and 2 columns wide starting from the bottom right.
Parameters:
rows (slice, tuple, or list) – Slices or 2 element tuples/lists containing start, stop indexes.
cols (slice, tuple, or list) – Slices or 2 element tuples/lists containing start, stop indexes.
height (float) – A shape to resolve relative values against. Only used when a start
or stop index is negative or a stop index is None.
width (float) – A shape to resolve relative values against. Only used when a start
or stop index is negative or a stop index is None.
boundless (bool, optional) – Whether the inputs are bounded (default) or not.
Return an image array with input geometries burned in.
Warnings will be raised for any invalid or empty geometries, and
an exception will be raised if there are no valid shapes
to rasterize.
Parameters:
shapes (iterable of (`geometry`, value) pairs or geometries) – The geometry can either be an object that implements the geo
interface or GeoJSON-like object. If no value is provided
the default_value will be used. If value is None the
fill value will be used.
fill (int or float, optional) – Used as fill value for all areas not covered by input
geometries.
nodata (float, optional) – nodata value to use in output file or masked array.
masked (bool, optional.Default: False.) – If True, return a masked array. Note: nodata is always set in
the case of file output.
out (numpy.ndarray, optional) – Array in which to store results. If not provided, out_shape
and dtype are required.
transform (Affinetransformationobject, optional) – Transformation from pixel coordinates of source to the
coordinate system of the input shapes. See the transform
property of dataset objects.
all_touched (boolean, optional) – If True, all pixels touched by geometries will be burned in. If
false, only pixels whose center is within the polygon or that
are selected by Bresenham’s line algorithm will be burned in.
merge_alg (MergeAlg, optional) –
Merge algorithm to use. One of:
MergeAlg.replace (default):
the new value will overwrite the existing value.
MergeAlg.add:
the new value will be added to the existing raster.
default_value (int or float, optional) – Used as value for all geometries, if not provided in shapes.
dtype (rasterio or numpy.dtype, optional) – Used as data type for results, if out is not provided.
skip_invalid (bool, optional) – If True (default), invalid shapes will be skipped. If False,
ValueError will be raised.
dst_path (str or PathLike, optional) – Path of output dataset
dst_kwds (dict, optional) – Dictionary of creation options and other parameters that will be
overlaid on the profile of the output dataset.
Returns:
If out was not None then out is returned, it will have been
modified in-place. If out was None, this will be a new array.
Valid data types for fill, default_value, out, dtype and
shape values are “int16”, “int32”, “uint8”, “uint16”, “uint32”,
“float32”, and “float64”.
This function requires significant memory resources. The shapes
iterator will be materialized to a Python list and another C copy of
that list will be made. The out array will be copied and
additional temporary raster memory equal to 2x the smaller of out
data or GDAL’s max cache size (controlled by GDAL_CACHEMAX, default
is 5% of the computer’s physical memory) is required.
If GDAL max cache size is smaller than the output data, the array of
shapes will be iterated multiple times. Performance is thus a linear
function of buffer size. For maximum speed, ensure that
GDAL_CACHEMAX is larger than the size of out or out_shape.
The dataset may be located in a local file, in a resource located by
a URL, or contained within a stream of bytes. This function accepts
different types of fp parameters. However, it is almost always best
to pass a string that has a dataset name as its value. These are
passed directly to GDAL protocol and format handlers. A path to
a zipfile is more efficiently used by GDAL than a Python ZipFile
object, for example.
In read (‘r’) or read/write (‘r+’) mode, no keyword arguments are
required: these attributes are supplied by the opened dataset.
In write (‘w’ or ‘w+’) mode, the driver, width, height, count, and
dtype keywords are strictly required.
Parameters:
fp (str, os.PathLike, file-like, or rasterio.io.MemoryFile) – A filename or URL, a file object opened in binary (‘rb’) mode,
a Path object, or one of the rasterio classes that provides the
dataset-opening interface (has an open method that returns
a dataset). Use a string when possible: GDAL can more
efficiently access a dataset if it opens it natively.
mode (str, optional) – ‘r’ (read, the default), ‘r+’ (read/write), ‘w’ (write), or
‘w+’ (write/read).
driver (str, optional) – A short format driver name (e.g. “GTiff” or “JPEG”) or a list of
such names (see GDAL docs at
https://gdal.org/drivers/raster/index.html). In ‘w’ or ‘w+’ modes
a single name is required. In ‘r’ or ‘r+’ modes the driver can
usually be omitted. Registered drivers will be tried
sequentially until a match is found. When multiple drivers are
available for a format such as JPEG2000, one of them can be
selected by using this keyword argument.
width (int, optional) – The number of columns of the raster dataset. Required in ‘w’ or
‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.
height (int, optional) – The number of rows of the raster dataset. Required in ‘w’ or
‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.
count (int, optional) – The count of dataset bands. Required in ‘w’ or ‘w+’ modes, it is
ignored in ‘r’ or ‘r+’ modes.
crs (str, dict, or CRS, optional) – The coordinate reference system. Required in ‘w’ or ‘w+’ modes,
it is ignored in ‘r’ or ‘r+’ modes.
transform (affine.Affine, optional) – Affine transformation mapping the pixel space to geographic
space. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or
‘r+’ modes.
dtype (str or numpy.dtype, optional) – The data type for bands. For example: ‘uint8’ or
rasterio.uint16. Required in ‘w’ or ‘w+’ modes, it is
ignored in ‘r’ or ‘r+’ modes.
nodata (int, float, or nan, optional) – Defines the pixel value to be interpreted as not valid data.
Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’
modes.
sharing (bool, optional) – To reduce overhead and prevent programs from running out of file
descriptors, rasterio maintains a pool of shared low level
dataset handles. If True this function will use a shared
handle if one is available. Multithreaded programs must avoid
sharing and should set sharing to False.
opener (callable, optional) – A custom dataset opener which can serve GDAL’s virtual
filesystem machinery via Python file-like objects. The
underlying file-like object is obtained by calling opener with
(fp, mode) or (fp, mode + “b”) depending on the format
driver’s native mode. opener must return a Python file-like
object that provides read, seek, tell, and close methods. Note:
only one opener at a time per fp, mode pair is allowed.
kwargs (optional) – These are passed to format drivers as directives for creating or
interpreting datasets. For example: in ‘w’ or ‘w+’ modes
a tiled=True keyword argument will direct the GeoTIFF format
driver to create a tiled, rather than striped, TIFF.
Returns:
rasterio.io.DatasetReader – If mode is “r”.
rasterio.io.DatasetWriter – If mode is “r+”, “w”, or “w+”.
Raises:
TypeError – If arguments are of the wrong Python type.
rasterio.errors.RasterioIOError – If the dataset can not be opened. Such as when there is no
dataset with the given name.
rasterio.errors.DriverCapabilityError – If the detected format driver does not support the requested
opening mode.
Examples
To open a local GeoTIFF dataset for reading using standard driver
discovery and no directives:
Get rows and cols of the pixels containing (x, y).
Parameters:
transform (Affine or sequence of GroundControlPoint or RPC) – Transform suitable for input to AffineTransformer,
GCPTransformer, or RPCTransformer.
xs (list or float) – x values in coordinate reference system.
ys (list or float) – y values in coordinate reference system.
zs (list or float, optional) – Height associated with coordinates. Primarily used for RPC based
coordinate transformations. Ignored for affine based
transformations. Default: 0.
op (function, optional(default: numpy.floor)) – Function to convert fractional pixels to whole numbers (floor,
ceiling, round)
precision (int or float, optional) – This parameter is unused, deprecated in rasterio 1.3.0, and
will be removed in version 2.0.0.
rpc_options (dict, optional) – Additional arguments passed to GDALCreateRPCTransformer.
Returns:
rows (array of ints or floats)
cols (array of ints or floats) – Integers are the default. The numerical type is determined by
the type returned by op().
zs (list or float, optional) – Height associated with coordinates. Primarily used for RPC based
coordinate transformations. Ignored for affine based
transformations. Default: 0.
offset (str, optional) – Determines if the returned coordinates are for the center of the
pixel or for a corner.
rpc_options (dict, optional) – Additional arguments passed to GDALCreateRPCTransformer.
Returns:
xs (float or list of floats) – x coordinates in coordinate reference system
ys (float or list of floats) – y coordinates in coordinate reference system
Normal unpickling creates a situation where __new__ receives all
9 elements rather than the 6 that are required for the
constructor. This method ensures that only the 6 are provided.
Apply the transform using matrix multiplication, creating
a resulting object of the same type. A transform may be applied
to another transform, a vector, vector array, or shape.
Parameters:
other (Affine, Vec2,
Vec2Array, Shape) – The object to transform.
Return type:
Same as other
static__new__(cls, a, b, c, d, e, f, g=0.0, h=0.0, i=1.0)[source]
Create a new object
Parameters:
a (float) – Elements of an augmented affine transformation matrix.
b (float) – Elements of an augmented affine transformation matrix.
c (float) – Elements of an augmented affine transformation matrix.
d (float) – Elements of an augmented affine transformation matrix.
e (float) – Elements of an augmented affine transformation matrix.
f (float) – Elements of an augmented affine transformation matrix.
i.e., if angles between points are preserved after applying the
transform, within rounding limits. This implies that the
transform has no effective shear.
Which means that the transform represents a rigid motion, which
has no effective scaling or shear. Mathematically, this means
that the axis vectors of the transform matrix are perpendicular
and unit-length. Applying an orthonormal transform to a shape
always results in a congruent shape.
Create a scaling transform from a scalar or vector.
Parameters:
scaling (float or sequence) – The scaling factor. A scalar value will
scale in both dimensions equally. A vector scaling
value scales the dimensions independently.
A class for handling cost assumptions for rasterization.
This class handles:
- Loading cost assumptions from files (CSV, Excel, JSON) or generating of cost
assumptions from a dictionary or a GeoDataFrame.
- Mapping costs to features in a GeoDataFrame
- Managing hierarchical cost structures
dictionary of cost assumptions with nested structure based on DataFrame
columns
Uses one numeric column for costs, and all other columns as a hierarchical
index:
- The first column is the ‘main_feature’
- All additional columns are ‘side_features’
A GeoDataFrame object is a pandas.DataFrame that has one or more columns
containing geometry.
In addition to the standard DataFrame constructor arguments,
GeoDataFrame also accepts the following keyword arguments:
Parameters:
crs (value(optional)) – Coordinate Reference System of the geometry objects. Can be anything accepted by
pyproj.CRS.from_user_input(),
such as an authority string (eg “EPSG:4326”) or a WKT string.
Value to use as the active geometry column.
If str, treated as column name to use. If array-like, it will be
added as new column named ‘geometry’ on the GeoDataFrame and set as the
active geometry column.
Note that if geometry is a (Geo)Series with a
name, the name will not be used, a column named “geometry” will still be
added. To preserve the name, you can use rename_geometry()
to update the geometry column name.
Examples
Constructing GeoDataFrame from a dictionary.
>>> fromshapely.geometryimportPoint>>> d={'col1':['name1','name2'],'geometry':[Point(1,2),Point(2,1)]}>>> gdf=geopandas.GeoDataFrame(d,crs="EPSG:4326")>>> gdf col1 geometry0 name1 POINT (1 2)1 name2 POINT (2 1)
Notice that the inferred dtype of ‘geometry’ columns is geometry.
Even when the index of other is the same as the index of the DataFrame,
the Series will not be reoriented. If index-wise alignment is desired,
DataFrame.add() should be used with axis=’index’.
>>> s2=pd.Series([0.5,1.5],index=['elk','moose'])>>> df[['height','weight']]+s2 elk height moose weightelk NaN NaN NaN NaNmoose NaN NaN NaN NaN
Export the pandas DataFrame as an Arrow C stream PyCapsule.
This relies on pyarrow to convert the pandas DataFrame to the Arrow
format (and follows the default behaviour of pyarrow.Table.from_pandas
in its handling of the index, i.e. store the index as a column except
for RangeIndex).
This conversion is not necessarily zero-copy.
Parameters:
requested_schema (PyCapsule, defaultNone) – The schema to which the dataframe should be casted, passed as a
PyCapsule containing a C ArrowSchema representation of the
requested schema.
If the result is a column containing only ‘geometry’, return a
GeoSeries. If it’s a DataFrame with any columns of GeometryDtype,
return a GeoDataFrame.
Returns a name if a GeoDataFrame has an active geometry column set,
otherwise returns None. The return type is usually a string, but may be
an integer, tuple or other hashable, depending on the contents of the
dataframe columns.
You can also access the active geometry column using the
.geometry property. You can set a GeoSeries to be an active geometry
using the set_geometry() method.
Get Addition of dataframe and other, element-wise (binary operator add).
Equivalent to dataframe+other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, radd.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Perform operation over exponential weighted window.
Notes
The aggregation operations are always performed over an axis, either the
index (default) or the column axis. This behavior is different from
numpy aggregation functions (mean, median, prod, sum, std,
var), where the default is to compute the aggregation of the flattened
array, e.g., numpy.mean(arr_2d) as opposed to
numpy.mean(arr_2d,axis=0).
Perform operation over exponential weighted window.
Notes
The aggregation operations are always performed over an axis, either the
index (default) or the column axis. This behavior is different from
numpy aggregation functions (mean, median, prod, sum, std,
var), where the default is to compute the aggregation of the flattened
array, e.g., numpy.mean(arr_2d) as opposed to
numpy.mean(arr_2d,axis=0).
Always returns new objects. If copy=False and no reindexing is
required then original objects are returned.
Note
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
fill_value (scalar, defaultnp.nan) – Value to use for missing values. Defaults to NaN, but can be any
“compatible” value.
If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
Deprecated since version 2.1.
fill_axis ({0or'index'} for Series, {0or'index',1or'columns'} for DataFrame, default0) –
Filling axis, method and limit.
Deprecated since version 2.1.
broadcast_axis ({0or'index'} for Series, {0or'index',1or'columns'} for DataFrame, defaultNone) –
Broadcast values along this axis, if aligning two objects of
different dimensions.
>>> df=pd.DataFrame(... [[1,2,3,4],[6,7,8,9]],columns=["D","B","E","A"],index=[1,2]... )>>> other=pd.DataFrame(... [[10,20,30,40],[60,70,80,90],[600,700,800,900]],... columns=["A","B","C","D"],... index=[2,3,4],... )>>> df D B E A1 1 2 3 42 6 7 8 9>>> other A B C D2 10 20 30 403 60 70 80 904 600 700 800 900
Align on columns:
>>> left,right=df.align(other,join="outer",axis=1)>>> left A B C D E1 4 2 NaN 1 32 9 7 NaN 6 8>>> right A B C D E2 10 20 30 40 NaN3 60 70 80 90 NaN4 600 700 800 900 NaN
We can also align on the index:
>>> left,right=df.align(other,join="outer",axis=0)>>> left D B E A1 1.0 2.0 3.0 4.02 6.0 7.0 8.0 9.03 NaN NaN NaN NaN4 NaN NaN NaN NaN>>> right A B C D1 NaN NaN NaN NaN2 10.0 20.0 30.0 40.03 60.0 70.0 80.0 90.04 600.0 700.0 800.0 900.0
Finally, the default axis=None will align on both index and columns:
>>> left,right=df.align(other,join="outer",axis=None)>>> left A B C D E1 4.0 2.0 NaN 1.0 3.02 9.0 7.0 NaN 6.0 8.03 NaN NaN NaN NaN NaN4 NaN NaN NaN NaN NaN>>> right A B C D E1 NaN NaN NaN NaN NaN2 10.0 20.0 30.0 40.0 NaN3 60.0 70.0 80.0 90.0 NaN4 600.0 700.0 800.0 900.0 NaN
Return whether all elements are True, potentially over an axis.
Returns True unless there at least one element within a series or
along a Dataframe axis that is False or equivalent (e.g. zero or
empty).
Parameters:
axis ({0or'index',1or'columns',None}, default0) –
Indicate which axis or axes should be reduced. For Series this parameter
is unused and defaults to 0.
0 / ‘index’ : reduce the index, return a Series whose index is the
original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the
original index.
None : reduce all axes, return a scalar.
bool_only (bool, defaultFalse) – Include only boolean columns. Not implemented for Series.
skipna (bool, defaultTrue) – Exclude NA/null values. If the entire row/column is NA and skipna is
True, then the result will be True, as for an empty row/column.
If skipna is False, then NA are treated as True, because these are not
equal to zero.
**kwargs (any, defaultNone) – Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns:
If level is specified, then, DataFrame is returned; otherwise, Series
is returned.
Return whether any element is True, potentially over an axis.
Returns False unless there is at least one element within a series or
along a Dataframe axis that is True or equivalent (e.g. non-zero or
non-empty).
Parameters:
axis ({0or'index',1or'columns',None}, default0) –
Indicate which axis or axes should be reduced. For Series this parameter
is unused and defaults to 0.
0 / ‘index’ : reduce the index, return a Series whose index is the
original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the
original index.
None : reduce all axes, return a scalar.
bool_only (bool, defaultFalse) – Include only boolean columns. Not implemented for Series.
skipna (bool, defaultTrue) – Exclude NA/null values. If the entire row/column is NA and skipna is
True, then the result will be False, as for an empty row/column.
If skipna is False, then NA are treated as True, because these are not
equal to zero.
**kwargs (any, defaultNone) – Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns:
If level is specified, then, DataFrame is returned; otherwise, Series
is returned.
Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as a dict-like container for Series objects. The primary
pandas data structure.
Parameters:
data (ndarray(structured or homogeneous), Iterable, dict, or DataFrame) –
Dict can contain Series, arrays, constants, dataclass or list-like objects. If
data is a dict, column order follows insertion-order. If a dict contains Series
which have an index defined, it is aligned by its index. This alignment also
occurs if data is a Series or a DataFrame itself. Alignment is done on
Series/DataFrame inputs.
If data is a list of dicts, column order follows insertion-order.
index (Index or array-like) – Index to use for resulting frame. Will default to RangeIndex if
no indexing information part of input data and no index provided.
columns (Index or array-like) – Column labels to use for resulting frame when data does not have them,
defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels,
will perform column selection instead.
dtype (dtype, defaultNone) – Data type to force. Only a single dtype is allowed. If None, infer.
Copy data from inputs.
For dict data, the default of None behaves like copy=True. For DataFrame
or 2d ndarray input, the default of None behaves like copy=False.
If data is a dict containing one or more Series (possibly of different dtypes),
copy=False will ensure that these inputs are not copied.
Area may be invalid for a geographic CRS using degrees as units;
use GeoSeries.to_crs() to project geometries to a planar
CRS before using this function.
Every operation in GeoPandas is planar, i.e. the potential third
dimension is not taken into account.
Returns the original data conformed to a new index with the specified
frequency.
If the index of this Series/DataFrame is a PeriodIndex, the new index
is the result of transforming the original index with
PeriodIndex.asfreq (so the original index
will map one-to-one to the new index).
Otherwise, the new index will be equivalent to pd.date_range(start,end,freq=freq) where start and end are, respectively, the first and
last entries in the original index (see pandas.date_range()). The
values corresponding to any timesteps in the new index which were not present
in the original index will be null (NaN), unless a method for filling
such unknowns is provided (see the method parameter below).
The resample() method is more appropriate if an operation on each group of
timesteps (such as an aggregate) is necessary to represent the data at the new
frequency.
Parameters:
freq (DateOffset or str) – Frequency DateOffset or string.
Return the last row(s) without any NaNs before where.
The last row (for each element in where, if list) without any
NaN is taken.
In case of a DataFrame, the last row without NaN
considering only the subset of columns (if not None)
If there is no good value, NaN is returned for a Series or
a Series of NaN values for a DataFrame
Parameters:
where (date or array-like of dates) – Date(s) before which the last row(s) are returned.
subset (str or array-like of str, default None) – For DataFrame, if not None, only use these columns to
check for NaNs.
Returns:
The return can be:
scalar : when self is a Series and where is a scalar
Series: when self is a Series and where is an array-like,
or when self is a DataFrame and where is a scalar
DataFrame : when self is a DataFrame and where is an
array-like
Return type:
scalar, Series, or DataFrame
See also
merge_asof
Perform an asof merge. Similar to left join.
Notes
Dates are assumed to be sorted. Raises if this is not the case.
Returns a new object with all original columns in addition to new ones.
Existing columns that are re-assigned will be overwritten.
Parameters:
**kwargs (dict of {str:callableorSeries}) – The column names are keywords. If the values are
callable, they are computed on the DataFrame and
assigned to the new columns. The callable must not
change input DataFrame (though pandas doesn’t check it).
If the values are not callable, (e.g. a Series, scalar, or array),
they are simply assigned.
Returns:
A new DataFrame with the new columns in addition to
all the existing columns.
Return type:
DataFrame
Notes
Assigning multiple columns within the same assign is possible.
Later items in ‘**kwargs’ may refer to newly created or modified
columns in ‘df’; items are computed and assigned into ‘df’ in order.
dtype (str, datatype, Series or Mapping of columnname->datatype) – Use a str, numpy.dtype, pandas.ExtensionDtype or Python type to
cast entire pandas object to the same type. Alternatively, use a
mapping, e.g. {col: dtype, …}, where col is a column label and dtype is
a numpy.dtype or Python type to cast one or more of the DataFrame’s
columns to column-specific types.
Return a copy when copy=True (be very careful setting
copy=False as changes to values then may propagate to other
pandas objects).
Note
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
errors ({'raise','ignore'}, default'raise') –
Control raising of exceptions on invalid data for provided dtype.
raise : allow exceptions to be raised
ignore : suppress exceptions. On error return original object.
Changed in version 2.0.0: Using astype to convert from timezone-naive dtype to
timezone-aware dtype will raise an exception.
Use Series.dt.tz_localize() instead.
Access a single value for a row/column label pair.
Similar to loc, in that both provide label-based lookups. Use
at if you only need to get or set a single value in a DataFrame
or Series.
Raises:
KeyError – If getting a value and ‘label’ does not exist in a DataFrame or Series.
ValueError – If row/column label pair is not a tuple or if any label
from the pair is not a scalar for DataFrame.
If label is list-like (excluding NamedTuple) for Series.
See also
DataFrame.at
Access a single value for a row/column pair by label.
DataFrame.iat
Access a single value for a row/column pair by integer position.
DataFrame.loc
Access a group of rows and columns by label(s).
DataFrame.iloc
Access a group of rows and columns by integer position(s).
attrs is experimental and may change without warning.
See also
DataFrame.flags
Global flags applying to this object.
Notes
Many operations that create new datasets will copy attrs. Copies
are always deep so that changing attrs will only affect the
present dataset. pandas.concat copies attrs only if all input
datasets have the same attrs.
Fill NA/NaN values by using the next valid observation to fill the gap.
Parameters:
axis ({0or'index'} for Series, {0or'index',1or'columns'} for DataFrame) – Axis along which to fill missing values. For Series
this parameter is unused and defaults to 0.
inplace (bool, defaultFalse) – If True, fill in-place. Note: this will modify any
other views on this object (e.g., a no-copy slice for a column in a
DataFrame).
limit (int, defaultNone) – If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
A dict of item->dtype of what to downcast if possible,
or the string ‘infer’ which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Deprecated since version 2.2.0.
Returns:
Object with missing values filled or None if inplace=True.
Return the bool of a single element Series or DataFrame.
Deprecated since version 2.1.0: bool is deprecated and will be removed in future version of pandas.
For Series use pandas.Series.item.
This must be a boolean scalar value, either True or False. It will raise a
ValueError if the Series or DataFrame does not have exactly 1 element, or that
element is not boolean (integer values 0 and 1 will also raise an exception).
Make a box-and-whisker plot from DataFrame columns, optionally grouped
by some other columns. A box plot is a method for graphically depicting
groups of numerical data through their quartiles.
The box extends from the Q1 to Q3 quartile values of the data,
with a line at the median (Q2). The whiskers extend from the edges
of box to show the range of the data. By default, they extend no more than
1.5 * IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest
data point within that interval. Outliers are plotted as separate dots.
For further details see
Wikipedia’s entry for boxplot.
by (str or array-like, optional) – Column in the DataFrame to pandas.DataFrame.groupby().
One box-plot will be done per value of columns in by.
ax (object of classmatplotlib.axes.Axes, optional) – The matplotlib axes to be used by boxplot.
fontsize (float or str) – Tick label font size in points or as a string (e.g., large).
rot (float, default0) – The rotation angle of labels (in degrees)
with respect to the screen coordinate system.
grid (bool, defaultTrue) – Setting this to True will show the grid.
figsize (Atuple(width, height)ininches) – The size of the figure to create in matplotlib.
layout (tuple(rows, columns), optional) – For example, (3, 5) will display the subplots
using 3 rows and 5 columns, starting from the top-left.
return_type ({'axes','dict','both'} or None, default'axes') –
The kind of object to return. The default is axes.
’axes’ returns the matplotlib axes the boxplot is drawn on.
’dict’ returns a dictionary whose values are the matplotlib
Lines of the boxplot.
’both’ returns a namedtuple with the axes and dict.
when grouping with by, a Series mapping columns to
return_type is returned.
If return_type is None, a NumPy array
of axes with the same shape as layout is returned.
backend (str, defaultNone) – Backend to use instead of the backend specified in the option
plotting.backend. For instance, ‘matplotlib’. Alternatively, to
specify the plotting.backend for the whole session, set
pd.options.plotting.backend.
**kwargs – All other plotting keyword arguments to be passed to
matplotlib.pyplot.boxplot().
Use return_type='dict' when you want to tweak the appearance
of the lines after plotting. In this case a dict containing the Lines
making up the boxes, caps, fliers, medians, and whiskers is returned.
Examples
Boxplots can be created for every column in the dataframe
by df.boxplot() or indicating the columns to be used:
Boxplots of variables distributions grouped by the values of a third
variable can be created using the option by. For instance:
A list of strings (i.e. ['X','Y']) can be passed to boxplot
in order to group the data by combination of the variables in the x-axis:
The layout of boxplot can be adjusted giving a tuple to layout:
Additional formatting can be done to the boxplot, like suppressing the grid
(grid=False), rotating the labels in the x-axis (i.e. rot=45)
or changing the fontsize (i.e. fontsize=15):
The parameter return_type can be used to select the type of element
returned by boxplot. When return_type='axes' is selected,
the matplotlib axes on which the boxplot is drawn are returned:
Return a GeoSeries of geometries representing all points within
a given distance of each geometric object.
Computes the buffer of a geometry for positive and negative buffer distance.
The buffer of a geometry is defined as the Minkowski sum (or difference, for
negative distance) of the geometry with a circle with radius equal to the
absolute value of the buffer distance.
The buffer operation always returns a polygonal result. The negative or
zero-distance buffer of lines and points is always empty.
Parameters:
distance (float, np.array, pd.Series) – The radius of the buffer in the Minkowski sum (or difference). If np.array
or pd.Series are used then it must have same length as the GeoSeries.
resolution (int(optional, default16)) – The resolution of the buffer around each vertex. Specifies the number of
linear segments in a quarter circle in the approximation of circular arcs.
cap_style ({'round','square','flat'}, default'round') – Specifies the shape of buffered line endings. 'round' results in
circular line endings (see resolution). Both 'square' and 'flat'
result in rectangular line endings, 'flat' will end at the original
vertex, while 'square' involves adding the buffer width.
join_style ({'round','mitre','bevel'}, default'round') – Specifies the shape of buffered line midpoints. 'round' results in
rounded shapes. 'bevel' results in a beveled edge that touches the
original vertex. 'mitre' results in a single vertex that is beveled
depending on the mitre_limit parameter.
mitre_limit (float, default5.0) – Crops of 'mitre'-style joins if the point is displaced from the
buffered vertex by more than this limit.
single_sided (bool, defaultFalse) – Only buffer at one side of the geometry.
Create an areal geometry formed by the constituent linework.
Builds areas from the GeoSeries that contain linework which represents the edges
of a planar graph. Any geometry type may be provided as input; only the
constituent lines and rings will be used to create the output polygons. All
geometries within the GeoSeries are considered together and the resulting
polygons therefore do not map 1:1 to input geometries.
This function converts inner rings into holes. To turn inner rings into polygons
as well, use polygonize.
Unless you know that the input GeoSeries represents a planar graph with a clean
topology (e.g. there is a node on both lines where they intersect), it is
recommended to use node=True which performs noding prior to building areal
geometry. Using node=False will provide performance benefits but may result
in incorrect polygons if the input is not of the proper topology.
If the input linework crosses, this function may produce invalid polygons. Use
GeoSeries.make_valid() to ensure valid geometries.
Parameters:
node (bool, defaultTrue) – Perform noding prior to building the areas, by default True.
Clip points, lines, or polygon geometries to the mask extent.
Both layers must be in the same Coordinate Reference System (CRS).
The GeoDataFrame will be clipped to the full extent of the mask object.
If there are multiple polygons in mask, data from the GeoDataFrame will be
clipped to the total boundary of all polygons in mask.
Parameters:
mask (GeoDataFrame, GeoSeries, (Multi)Polygon, list-like) – Polygon vector layer used to clip the GeoDataFrame.
The mask’s geometry is dissolved into one geometric feature
and intersected with GeoDataFrame.
If the mask is list-like with four elements (minx,miny,maxx,maxy),
clip will use a faster rectangle clipping
(clip_by_rect()), possibly leading to slightly different
results.
keep_geom_type (boolean, defaultFalse) – If True, return only geometries of original type in case of intersection
resulting in multiple geometry types or GeometryCollections.
If False, return all resulting geometries (potentially mixed types).
sort (boolean, defaultFalse) – If True, the order of rows in the clipped GeoDataFrame will be preserved at
small performance cost. If False the order of rows in the clipped
GeoDataFrame will be random.
Returns:
Vector data (points, lines, polygons) from the GeoDataFrame clipped to
polygon boundary from mask.
Return a GeoSeries of the portions of geometry within the given
rectangle.
Note that the results are not exactly equal to
intersection(). E.g. in edge cases,
clip_by_rect() will not return a point just touching the
rectangle. Check the examples section below for some of these exceptions.
The geometry is clipped in a fast but possibly dirty way. The output is not
guaranteed to be valid. No exceptions will be raised for topological errors.
Note: empty geometries or geometries that do not overlap with the specified
bounds will result in GEOMETRYCOLLECTIONEMPTY.
Perform column-wise combine with another DataFrame.
Combines a DataFrame with other DataFrame using func
to element-wise combine columns. The row and column indexes of the
resulting DataFrame will be the union of the two.
Parameters:
other (DataFrame) – The DataFrame to merge column-wise.
func (function) – Function that takes two series as inputs and return a Series or a
scalar. Used to merge the two dataframes column by columns.
fill_value (scalarvalue, defaultNone) – The value to fill NaNs with prior to passing any column to the
merge func.
overwrite (bool, defaultTrue) – If True, columns in self that do not exist in other will be
overwritten with NaNs.
Returns:
Combination of the provided DataFrames.
Return type:
DataFrame
See also
DataFrame.combine_first
Combine two DataFrame objects and default to non-null values in frame calling the method.
Examples
Combine using a simple function that chooses the smaller column.
Example using a true element-wise combine function.
>>> df1=pd.DataFrame({'A':[5,0],'B':[2,4]})>>> df2=pd.DataFrame({'A':[1,1],'B':[3,3]})>>> df1.combine(df2,np.minimum) A B0 1 21 0 3
Using fill_value fills Nones prior to passing the column to the
merge function.
>>> df1=pd.DataFrame({'A':[0,0],'B':[None,4]})>>> df2=pd.DataFrame({'A':[1,1],'B':[3,3]})>>> df1.combine(df2,take_smaller,fill_value=-5) A B0 0 -5.01 0 4.0
However, if the same element in both dataframes is None, that None
is preserved
>>> df1=pd.DataFrame({'A':[0,0],'B':[None,4]})>>> df2=pd.DataFrame({'A':[1,1],'B':[None,3]})>>> df1.combine(df2,take_smaller,fill_value=-5) A B0 0 -5.01 0 3.0
Example that demonstrates the use of overwrite and behavior when
the axis differ between the dataframes.
>>> df1=pd.DataFrame({'A':[0,0],'B':[4,4]})>>> df2=pd.DataFrame({'B':[3,3],'C':[-10,1],},index=[1,2])>>> df1.combine(df2,take_smaller) A B C0 NaN NaN NaN1 NaN 3.0 -10.02 NaN 3.0 1.0
>>> df1.combine(df2,take_smaller,overwrite=False) A B C0 0.0 NaN NaN1 0.0 3.0 -10.02 NaN 3.0 1.0
Demonstrating the preference of the passed in dataframe.
>>> df2=pd.DataFrame({'B':[3,3],'C':[1,1],},index=[1,2])>>> df2.combine(df1,take_smaller) A B C0 0.0 NaN NaN1 0.0 3.0 NaN2 NaN 3.0 NaN
>>> df2.combine(df1,take_smaller,overwrite=False) A B C0 0.0 NaN NaN1 0.0 3.0 1.02 NaN 3.0 1.0
Update null elements with value in the same location in other.
Combine two DataFrame objects by filling null values in one DataFrame
with non-null values from other DataFrame. The row and column indexes
of the resulting DataFrame will be the union of the two. The resulting
dataframe contains the ‘first’ dataframe values and overrides the
second one values where both first.loc[index, col] and
second.loc[index, col] are not missing values, upon calling
first.combine_first(second).
Parameters:
other (DataFrame) – Provided DataFrame to use to fill null values.
Returns:
The result of combining the provided DataFrame with the other object.
Return type:
DataFrame
See also
DataFrame.combine
Perform series-wise operation on two DataFrames using a given function.
Examples
>>> df1=pd.DataFrame({'A':[None,0],'B':[None,4]})>>> df2=pd.DataFrame({'A':[1,1],'B':[3,3]})>>> df1.combine_first(df2) A B0 1.0 3.01 0.0 4.0
Null values still persist if the location of that null value
does not exist in other
>>> df1=pd.DataFrame({'A':[None,0],'B':[4,None]})>>> df2=pd.DataFrame({'B':[3,3],'C':[1,1]},index=[1,2])>>> df1.combine_first(df2) A B C0 NaN 4.0 NaN1 0.0 3.0 1.02 NaN 3.0 1.0
DataFrame that shows the differences stacked side by side.
The resulting index will be a MultiIndex with ‘self’ and ‘other’
stacked alternately at the inner level.
Return type:
DataFrame
Raises:
ValueError – When the two DataFrames don’t have identical labels or shape.
See also
Series.compare
Compare with another Series and show differences.
DataFrame.equals
Test whether two objects contain the same elements.
Notes
Matching NaNs will not appear as a difference.
Can only compare identically-labeled
(i.e. same shape, identical row and column labels) DataFrames
Examples
>>> df=pd.DataFrame(... {... "col1":["a","a","b","b","a"],... "col2":[1.0,2.0,3.0,np.nan,5.0],... "col3":[1.0,2.0,3.0,4.0,5.0]... },... columns=["col1","col2","col3"],... )>>> df col1 col2 col30 a 1.0 1.01 a 2.0 2.02 b 3.0 3.03 b NaN 4.04 a 5.0 5.0
>>> df2=df.copy()>>> df2.loc[0,'col1']='c'>>> df2.loc[2,'col3']=4.0>>> df2 col1 col2 col30 c 1.0 1.01 a 2.0 2.02 b 3.0 4.03 b NaN 4.04 a 5.0 5.0
Align the differences on columns
>>> df.compare(df2) col1 col3 self other self other0 a c NaN NaN2 NaN NaN 3.0 4.0
Assign result_names
>>> df.compare(df2,result_names=("left","right")) col1 col3 left right left right0 a c NaN NaN2 NaN NaN 3.0 4.0
Stack the differences on rows
>>> df.compare(df2,align_axis=0) col1 col30 self a NaN other c NaN2 self NaN 3.0 other NaN 4.0
Keep the equal values
>>> df.compare(df2,keep_equal=True) col1 col3 self other self other0 a c 1.0 1.02 b b 3.0 4.0
Keep all original rows and columns
>>> df.compare(df2,keep_shape=True) col1 col2 col3 self other self other self other0 a c NaN NaN NaN NaN1 NaN NaN NaN NaN NaN NaN2 NaN NaN NaN NaN 3.0 4.03 NaN NaN NaN NaN NaN NaN4 NaN NaN NaN NaN NaN NaN
Keep all original rows and columns and also all original values
>>> df.compare(df2,keep_shape=True,keep_equal=True) col1 col2 col3 self other self other self other0 a c 1.0 1.0 1.0 1.01 a a 2.0 2.0 2.0 2.02 b b 3.0 3.0 3.0 4.03 b b NaN NaN 4.0 4.04 a a 5.0 5.0 5.0 5.0
Return a GeoSeries of geometries representing the concave hull
of vertices of each geometry.
The concave hull of a geometry is the smallest concave Polygon
containing all the points in each geometry, unless the number of points
in the geometric object is less than three. For two points, the concave
hull collapses to a LineString; for 1, a Point.
The hull is constructed by removing border triangles of the Delaunay
Triangulation of the points as long as their “size” is larger than the
maximum edge length ratio and optionally allowing holes. The edge length factor
is a fraction of the length difference between the longest and shortest edges
in the Delaunay Triangulation of the input points. For further information
on the algorithm used, see
https://libgeos.org/doxygen/classgeos_1_1algorithm_1_1hull_1_1ConcaveHull.html
Parameters:
ratio (float, (optional, default0.0)) – Number in the range [0, 1]. Higher numbers will include fewer vertices
in the hull.
allow_holes (bool, (optional, defaultFalse)) – If set to True, the concave hull may have holes.
The algorithms considers only vertices of each geometry. As a result the
hull may not fully enclose input geometry. If that happens, increasing ratio
should resolve the issue.
Return a GeoSeries with the constrained
Delaunay triangulation of polygons.
A constrained Delaunay triangulation requires the edges of the input
polygon(s) to be in the set of resulting triangle edges. An
unconstrained delaunay triangulation only triangulates based on the
vertices, hence triangle edges could cross polygon boundaries.
Return a Series of dtype('bool') with value True for
each aligned geometry that contains other.
An object is said to contain other if at least one point of other lies in
the interior and no points of other lie in the exterior of the object.
(Therefore, any given polygon does not contain its own boundary - there is not
any point that lies in the interior.)
If either object is empty, this operation returns False.
This is the inverse of within() in the sense that the expression
a.contains(b)==b.within(a) always evaluates to True.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to test if it
is contained.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Return a Series of dtype('bool') with value True for
each aligned geometry that is completely inside other, with no common
boundary points.
Geometry A contains geometry B properly if B intersects the interior of A but
not the boundary (or exterior). This means that a geometry A does not “contain
properly” itself, which contrasts with the contains() method,
where common points on the boundary are allowed.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to test if it
is contained.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Convert columns to the best possible dtypes using dtypes supporting pd.NA.
Parameters:
infer_objects (bool, defaultTrue) – Whether object dtypes should be converted to the best possible types.
convert_string (bool, defaultTrue) – Whether object dtypes should be converted to StringDtype().
convert_integer (bool, defaultTrue) – Whether, if possible, conversion can be done to integer extension types.
convert_boolean (bool, defaultsTrue) – Whether object dtypes should be converted to BooleanDtypes().
convert_floating (bool, defaultsTrue) – Whether, if possible, conversion can be done to floating extension types.
If convert_integer is also True, preference will be give to integer
dtypes if the floats can be faithfully casted to integers.
By default, convert_dtypes will attempt to convert a Series (or each
Series in a DataFrame) to dtypes that support pd.NA. By using the options
convert_string, convert_integer, convert_boolean and
convert_floating, it is possible to turn off individual conversions
to StringDtype, the integer extension types, BooleanDtype
or floating extension types, respectively.
For object-dtyped columns, if infer_objects is True, use the inference
rules as during normal Series/DataFrame construction. Then, if possible,
convert to StringDtype, BooleanDtype or an appropriate integer
or floating extension type, otherwise leave as object.
If the dtype is integer, convert to an appropriate integer extension type.
If the dtype is numeric, and consists of all integers, convert to an
appropriate integer extension type. Otherwise, convert to an
appropriate floating extension type.
In the future, as new dtypes are added that support pd.NA, the results
of this method will change to support those new dtypes.
Return a GeoSeries of geometries representing the convex hull
of each geometry.
The convex hull of a geometry is the smallest convex Polygon
containing all the points in each geometry, unless the number of points
in the geometric object is less than three. For two points, the convex
hull collapses to a LineString; for 1, a Point.
Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as a dict-like container for Series objects. The primary
pandas data structure.
Parameters:
data (ndarray(structured or homogeneous), Iterable, dict, or DataFrame) –
Dict can contain Series, arrays, constants, dataclass or list-like objects. If
data is a dict, column order follows insertion-order. If a dict contains Series
which have an index defined, it is aligned by its index. This alignment also
occurs if data is a Series or a DataFrame itself. Alignment is done on
Series/DataFrame inputs.
If data is a list of dicts, column order follows insertion-order.
index (Index or array-like) – Index to use for resulting frame. Will default to RangeIndex if
no indexing information part of input data and no index provided.
columns (Index or array-like) – Column labels to use for resulting frame when data does not have them,
defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels,
will perform column selection instead.
dtype (dtype, defaultNone) – Data type to force. Only a single dtype is allowed. If None, infer.
Copy data from inputs.
For dict data, the default of None behaves like copy=True. For DataFrame
or 2d ndarray input, the default of None behaves like copy=False.
If data is a dict containing one or more Series (possibly of different dtypes),
copy=False will ensure that these inputs are not copied.
Compute pairwise correlation of columns, excluding NA/null values.
Parameters:
method ({'pearson','kendall','spearman'} or callable) –
Method of correlation:
pearson : standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
callable: callable with input two 1d ndarrays
and returning a float. Note that the returned matrix from corr
will have 1 along the diagonals and will be symmetric
regardless of the callable’s behavior.
min_periods (int, optional) – Minimum number of observations required per pair of columns
to have a valid result. Currently only available for Pearson
and Spearman correlation.
Pairwise correlation is computed between rows or columns of
DataFrame with rows or columns of Series or DataFrame. DataFrames
are first aligned along both axes before computing the
correlations.
Parameters:
other (DataFrame, Series) – Object with which to compute correlations.
axis ({0or'index',1or'columns'}, default0) – The axis to use. 0 or ‘index’ to compute row-wise, 1 or ‘columns’ for
column-wise.
drop (bool, defaultFalse) – Drop missing indices from result.
method ({'pearson','kendall','spearman'} or callable) –
The values None, NaN, NaT, pandas.NA are considered NA.
Parameters:
axis ({0or'index',1or'columns'}, default0) – If 0 or ‘index’ counts are generated for each column.
If 1 or ‘columns’ counts are generated for each row.
numeric_only (bool, defaultFalse) – Include only float, int or boolean data.
Returns:
For each column/row the number of non-NA/null entries.
Return type:
Series
See also
Series.count
Number of non-NA elements in a Series.
DataFrame.value_counts
Count unique combinations of columns.
DataFrame.shape
Number of DataFrame rows and columns (including NA elements).
DataFrame.isna
Boolean same-sized DataFrame showing places of NA elements.
Examples
Constructing DataFrame from a dictionary:
>>> df=pd.DataFrame({"Person":... ["John","Myla","Lewis","John","Myla"],... "Age":[24.,np.nan,21.,33,26],... "Single":[False,True,True,True,False]})>>> df Person Age Single0 John 24.0 False1 Myla NaN True2 Lewis 21.0 True3 John 33.0 True4 Myla 26.0 False
Return a Series containing the count of geometries in each multi-part
geometry.
For single-part geometry objects, this is always 1. For multi-part geometries,
like MultiPoint or MultiLineString, it is the number of parts in the
geometry. For GeometryCollection, it is the number of geometries direct
parts of the collection (the method does not recurse into collections within
collections).
Compute pairwise covariance of columns, excluding NA/null values.
Compute the pairwise covariance among the series of a DataFrame.
The returned data frame is the covariance matrix of the columns
of the DataFrame.
Both NA and null values are automatically excluded from the
calculation. (See the note below about bias from missing values.)
A threshold can be set for the minimum number of
observations for each value created. Comparisons with observations
below this threshold will be returned as NaN.
This method is generally used for the analysis of time series data to
understand the relationship between different measures
across time.
Parameters:
min_periods (int, optional) – Minimum number of observations required per pair of columns
to have a valid result.
ddof (int, default1) – Delta degrees of freedom. The divisor used in calculations
is N-ddof, where N represents the number of elements.
This argument is applicable only when no nan is in the dataframe.
Changed in version 2.0.0: The default value of numeric_only is now False.
Returns:
The covariance matrix of the series of the DataFrame.
Return type:
DataFrame
See also
Series.cov
Compute covariance with another Series.
core.window.ewm.ExponentialMovingWindow.cov
Exponential weighted sample covariance.
core.window.expanding.Expanding.cov
Expanding sample covariance.
core.window.rolling.Rolling.cov
Rolling sample covariance.
Notes
Returns the covariance matrix of the DataFrame’s time series.
The covariance is normalized by N-ddof.
For DataFrames that have Series that are missing data (assuming that
data is missing at random)
the returned covariance matrix will be an unbiased estimate
of the variance and covariance between the member Series.
However, for many applications this estimate may not be acceptable
because the estimate covariance matrix is not guaranteed to be positive
semi-definite. This could lead to estimate correlations having
absolute values which are greater than one, and/or a non-invertible
covariance matrix. See Estimation of covariance matrices for more details.
>>> np.random.seed(42)>>> df=pd.DataFrame(np.random.randn(1000,5),... columns=['a','b','c','d','e'])>>> df.cov() a b c d ea 0.998438 -0.020161 0.059277 -0.008943 0.014144b -0.020161 1.059352 -0.008543 -0.024738 0.009826c 0.059277 -0.008543 1.010670 -0.001486 -0.000271d -0.008943 -0.024738 -0.001486 0.921297 -0.013692e 0.014144 0.009826 -0.000271 -0.013692 0.977795
Minimum number of periods
This method also supports an optional min_periods keyword
that specifies the required minimum number of non-NA observations for
each column pair in order to have a valid result:
>>> np.random.seed(42)>>> df=pd.DataFrame(np.random.randn(20,3),... columns=['a','b','c'])>>> df.loc[df.index[:5],'a']=np.nan>>> df.loc[df.index[5:10],'b']=np.nan>>> df.cov(min_periods=12) a b ca 0.316741 NaN -0.150812b NaN 1.248003 0.191417c -0.150812 0.191417 0.895202
other (Geoseries or geometricobject) – The Geoseries (elementwise) or geometric object to check is being covered.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
other (Geoseries or geometricobject) – The Geoseries (elementwise) or geometric object to check is being covered.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Return a Series of dtype('bool') with value True for
each aligned geometry that cross other.
An object is said to cross other if its interior intersects the
interior of the other but does not contain it, and the dimension of
the intersection is less than the dimension of the one or the other.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to test if is
crossed.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
The Coordinate Reference System (CRS) represented as a pyproj.CRS
object.
Returns None if the CRS is not set, and to set the value it
:getter: Returns a pyproj.CRS or None. When setting, the value
can be anything accepted by
pyproj.CRS.from_user_input(),
such as an authority string (eg “EPSG:4326”) or a WKT string.
Examples
>>> gdf.crs<Geographic 2D CRS: EPSG:4326>Name: WGS 84Axis Info [ellipsoidal]:- Lat[north]: Geodetic latitude (degree)- Lon[east]: Geodetic longitude (degree)Area of Use:- name: World- bounds: (-180.0, -90.0, 180.0, 90.0)Datum: World Geodetic System 1984- Ellipsoid: WGS 84- Prime Meridian: Greenwich
Return cumulative maximum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
maximum.
Parameters:
axis ({0or'index',1or'columns'}, default0) – The index or the name of the axis. 0 is equivalent to None or ‘index’.
For Series this parameter is unused and defaults to 0.
skipna (bool, defaultTrue) – Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args – Additional keywords have no effect but might be accepted for
compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Return cumulative minimum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
minimum.
Parameters:
axis ({0or'index',1or'columns'}, default0) – The index or the name of the axis. 0 is equivalent to None or ‘index’.
For Series this parameter is unused and defaults to 0.
skipna (bool, defaultTrue) – Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args – Additional keywords have no effect but might be accepted for
compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Return cumulative product over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
product.
Parameters:
axis ({0or'index',1or'columns'}, default0) – The index or the name of the axis. 0 is equivalent to None or ‘index’.
For Series this parameter is unused and defaults to 0.
skipna (bool, defaultTrue) – Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args – Additional keywords have no effect but might be accepted for
compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Return cumulative sum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
sum.
Parameters:
axis ({0or'index',1or'columns'}, default0) – The index or the name of the axis. 0 is equivalent to None or ‘index’.
For Series this parameter is unused and defaults to 0.
skipna (bool, defaultTrue) – Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args – Additional keywords have no effect but might be accepted for
compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Coordinate based indexer to select by intersection with bounding box.
Format of input should be .cx[xmin:xmax,ymin:ymax]. Any of
xmin, xmax, ymin, and ymax can be provided, but input
must include a comma separating x and y slices. That is, .cx[:,:]
will return the full series/frame, but .cx[:] is not implemented.
Examples
>>> fromshapely.geometryimportLineString,Point>>> s=geopandas.GeoSeries(... [Point(0,0),Point(1,2),Point(3,3),LineString([(0,0),(3,3)])]... )>>> s0 POINT (0 0)1 POINT (1 2)2 POINT (3 3)3 LINESTRING (0 0, 3 3)dtype: geometry
Return a GeoSeries consisting of objects representing
the computed Delaunay triangulation between the vertices of
an input geometry.
All geometries within the GeoSeries are considered together within a single
Delaunay triangulation. The resulting geometries therefore do not map 1:1
to input geometries. Note that each vertex of a geometry is considered a site
for the triangulation, so the triangles will be constructed between the vertices
of each geometry.
Notes
If you want to generate Delaunay triangles for each geometry separately, use
shapely.delaunay_triangles() instead.
Parameters:
tolerance (float, default0.0) – Snap input vertices together if their distance is less than this value.
only_edges (bool(optional, defaultFalse)) – If set to True, the triangulation will return linestrings instead of
polygons.
Examples
>>> fromshapelyimportLineString,MultiPoint,Point,Polygon>>> s=geopandas.GeoSeries(... [... Point(1,1),... Point(2,2),... Point(1,3),... Point(0,2),... ]... )>>> s0 POINT (1 1)1 POINT (2 2)2 POINT (1 3)3 POINT (0 2)dtype: geometry
The method supports any geometry type but keep in mind that the underlying
algorithm is based on the vertices of the input geometries only and does not
consider edge segments between vertices.
Descriptive statistics include those that summarize the central
tendency, dispersion and shape of a
dataset’s distribution, excluding NaN values.
Analyzes both numeric and object series, as well
as DataFrame column sets of mixed data types. The output
will vary depending on what is provided. Refer to the notes
below for more detail.
Parameters:
percentiles (list-like of numbers, optional) – The percentiles to include in the output. All should
fall between 0 and 1. The default is
[.25,.5,.75], which returns the 25th, 50th, and
75th percentiles.
include ('all', list-like of dtypes or None(default), optional) –
A white list of data types to include in the result. Ignored
for Series. Here are the options:
’all’ : All columns of the input will be included in the output.
A list-like of dtypes : Limits the results to the
provided data types.
To limit the result to numeric types submit
numpy.number. To limit it instead to object columns submit
the numpy.object data type. Strings
can also be used in the style of
select_dtypes (e.g. df.describe(include=['O'])). To
select pandas categorical columns, use 'category'
None (default) : The result will include all numeric columns.
exclude (list-like of dtypes or None(default), optional,) –
A black list of data types to omit from the result. Ignored
for Series. Here are the options:
A list-like of dtypes : Excludes the provided data types
from the result. To exclude numeric types submit
numpy.number. To exclude object columns submit the data
type numpy.object. Strings can also be used in the style of
select_dtypes (e.g. df.describe(exclude=['O'])). To
exclude pandas categorical columns, use 'category'
None (default) : The result will exclude nothing.
Returns:
Summary statistics of the Series or Dataframe provided.
Return type:
Series or DataFrame
See also
DataFrame.count
Count number of non-NA/null observations.
DataFrame.max
Maximum of the values in the object.
DataFrame.min
Minimum of the values in the object.
DataFrame.mean
Mean of the values.
DataFrame.std
Standard deviation of the observations.
DataFrame.select_dtypes
Subset of a DataFrame including/excluding columns based on their dtype.
Notes
For numeric data, the result’s index will include count,
mean, std, min, max as well as lower, 50 and
upper percentiles. By default the lower percentile is 25 and the
upper percentile is 75. The 50 percentile is the
same as the median.
For object data (e.g. strings or timestamps), the result’s index
will include count, unique, top, and freq. The top
is the most common value. The freq is the most common value’s
frequency. Timestamps also include the first and last items.
If multiple object values have the highest count, then the
count and top results will be arbitrarily chosen from
among those with the highest count.
For mixed data types provided via a DataFrame, the default is to
return only an analysis of numeric columns. If the dataframe consists
only of object and categorical data without any numeric columns, the
default is to return an analysis of both the object and categorical
columns. If include='all' is provided as an option, the result
will include a union of attributes of each type.
The include and exclude parameters can be used to limit
which columns in a DataFrame are analyzed for the output.
The parameters are ignored when analyzing a Series.
Describing all columns of a DataFrame regardless of data type.
>>> df.describe(include='all') categorical numeric objectcount 3 3.0 3unique 3 NaN 3top f NaN afreq 1 NaN 1mean NaN 2.0 NaNstd NaN 1.0 NaNmin NaN 1.0 NaN25% NaN 1.5 NaN50% NaN 2.0 NaN75% NaN 2.5 NaNmax NaN 3.0 NaN
Describing a column from a DataFrame by accessing it as
an attribute.
Excluding object columns from a DataFrame description.
>>> df.describe(exclude=[object]) categorical numericcount 3 3.0unique 3 NaNtop f NaNfreq 1 NaNmean NaN 2.0std NaN 1.0min NaN 1.025% NaN 1.550% NaN 2.075% NaN 2.5max NaN 3.0
Calculates the difference of a DataFrame element compared with another
element in the DataFrame (default is element in previous row).
Parameters:
periods (int, default1) – Periods to shift for calculating difference, accepts negative
values.
axis ({0or'index',1or'columns'}, default0) – Take difference over rows (0) or columns (1).
Returns:
First differences of the Series.
Return type:
DataFrame
See also
DataFrame.pct_change
Percent change over given number of periods.
DataFrame.shift
Shift index by desired number of periods with an optional time freq.
Series.diff
First discrete difference of object.
Notes
For boolean dtypes, this uses operator.xor() rather than
operator.sub().
The result is calculated according to current dtype in DataFrame,
however dtype of the result is always float64.
Return a GeoSeries of the points in each aligned geometry that
are not in other.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (Geoseries or geometricobject) – The Geoseries (elementwise) or geometric object to find the
difference to.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Return a Series of dtype('bool') with value True for
each aligned geometry disjoint to other.
An object is said to be disjoint to other if its boundary and
interior does not intersect at all with those of the other.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to test if is
disjoint.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Dissolve geometries within groupby into single observation.
This is accomplished by applying the union_all method
to all geometries within a groupself.
Observations associated with each groupby group will be aggregated
using the aggfunc.
Parameters:
by (str or list-like, defaultNone) – Column(s) whose values define the groups to be dissolved. If None,
the entire GeoDataFrame is considered as a single group. If a list-like
object is provided, the values in the list are treated as categorical
labels, and polygons will be combined based on the equality of
these categorical labels.
aggfunc (function or string, default"first") –
Aggregation function for manipulation of data associated
with each group. Passed to pandas groupby.agg method.
Accepted combinations are:
function
string function name
list of functions and/or function names, e.g. [np.sum, ‘mean’]
dict of axis labels -> functions, function names or list of such.
as_index (boolean, defaultTrue) – If true, groupby columns become index of result.
level (int or str or sequence of int or sequence of str, defaultNone) – If the axis is a MultiIndex (hierarchical), group by a
particular level or levels.
sort (bool, defaultTrue) – Sort group keys. Get better performance by turning this off.
Note this does not influence the order of observations within
each group. Groupby preserves the order of rows within each group.
observed (bool, defaultFalse) – This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.
dropna (bool, defaultTrue) – If True, and if group keys contain NA values, NA values
together with row/column will be dropped. If False, NA
values will also be treated as the key in groups.
method (str(default````”unary”:py:class:`)`) –
The method to use for the union. Options are:
"unary": use the unary union algorithm. This option is the most robust
but can be slow for large numbers of geometries (default).
"coverage": use the coverage union algorithm. This option is optimized
for non-overlapping polygons and can be significantly faster than the
unary union algorithm. However, it can produce invalid geometries if the
polygons overlap.
"disjoint_subset:: use the disjoint subset union algorithm. This
option is optimized for inputs that can be divided into subsets that do
not intersect. If there is only one such subset, performance can be
expected to be worse than "unary". Requires Shapely >= 2.1.
When grid size is specified, a fixed-precision space is used to perform the
union operations. This can be useful when unioning geometries that are not
perfectly snapped or to avoid geometries not being unioned because of
robustness issues.
The inputs are first snapped to a grid of the given size. When a line
segment of a geometry is within tolerance off a vertex of another geometry,
this vertex will be inserted in the line segment. Finally, the result
vertices are computed on the same grid. Is only supported for method"unary". If None, the highest precision of the inputs will be used.
Defaults to None.
Keyword arguments to be passed to the pandas DataFrameGroupby.agg method
which is used by dissolve. In particular, numeric_only may be
supplied, which will be required in pandas 2.0 for certain aggfuncs.
Return a Series containing the distance to aligned other.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (Geoseries or geometricobject) – The Geoseries (elementwise) or geometric object to find the
distance to.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and use elements with the same index using
align=True or ignore index and use elements based on their matching
order using align=False:
Get Floating division of dataframe and other, element-wise (binary operator truediv).
Equivalent to dataframe/other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rtruediv.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Get Floating division of dataframe and other, element-wise (binary operator truediv).
Equivalent to dataframe/other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rtruediv.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Compute the matrix multiplication between the DataFrame and other.
This method computes the matrix product between the DataFrame and the
values of an other Series, DataFrame or a numpy array.
It can also be called using self@other.
Parameters:
other (Series, DataFrame or array-like) – The other object to compute the matrix product with.
Returns:
If other is a Series, return the matrix product between self and
other as a Series. If other is a DataFrame or a numpy.array, return
the matrix product of self and other in a DataFrame of a np.array.
Return type:
Series or DataFrame
See also
Series.dot
Similar method for Series.
Notes
The dimensions of DataFrame and other must be compatible in order to
compute the matrix multiplication. In addition, the column names of
DataFrame and the index of other must contain the same values, as they
will be aligned prior to the multiplication.
The dot method for Series computes the inner product, instead of the
matrix product here.
Remove rows or columns by specifying label names and corresponding
axis, or by directly specifying index or column names. When using a
multi-index, labels on different levels can be removed by specifying
the level. See the user guide
for more information about the now unused levels.
Parameters:
labels (singlelabel or list-like) – Index or column labels to drop. A tuple will be used as a single
label and not treated as a list-like.
axis ({0or'index',1or'columns'}, default0) – Whether to drop labels from the index (0 or ‘index’) or
columns (1 or ‘columns’).
index (singlelabel or list-like) – Alternative to specifying axis (labels,axis=0
is equivalent to index=labels).
columns (singlelabel or list-like) – Alternative to specifying axis (labels,axis=1
is equivalent to columns=labels).
level (int or levelname, optional) – For MultiIndex, level from which the labels will be removed.
inplace (bool, defaultFalse) – If False, return a copy. Otherwise, do operation
in place and return None.
errors ({'ignore','raise'}, default'raise') – If ‘ignore’, suppress error and only existing labels are
dropped.
Returns:
Returns DataFrame or None DataFrame with the specified
index or column labels removed or None if inplace=True.
Drop a specific index combination from the MultiIndex
DataFrame, i.e., drop the combination 'falcon' and
'weight', which deletes only the corresponding row
Return Series/DataFrame with requested index / column level(s) removed.
Parameters:
level (int, str, or list-like) – If a string is given, must be the name of a level
If list-like, elements must be names or positional indexes
of levels.
axis ({0or'index',1or'columns'}, default0) –
Axis along which the level(s) is removed:
0 or ‘index’: remove level(s) in column.
1 or ‘columns’: remove level(s) in row.
For Series this parameter is unused and defaults to 0.
Returns:
Series/DataFrame with requested index / column level(s) removed.
See the User Guide for more on which values are
considered missing, and how to work with missing data.
Parameters:
axis ({0or'index',1or'columns'}, default0) –
Determine if rows or columns which contain missing values are
removed.
0, or ‘index’ : Drop rows which contain missing values.
1, or ‘columns’ : Drop columns which contain missing value.
Only a single axis is allowed.
how ({'any','all'}, default'any') –
Determine if row or column is removed from DataFrame, when we have
at least one NA or all NA.
’any’ : If any NA values are present, drop that row or column.
’all’ : If all values are NA, drop that row or column.
thresh (int, optional) – Require that many non-NA values. Cannot be combined with how.
subset (columnlabel or sequence of labels, optional) – Labels along other axis to consider, e.g. if you are dropping rows
these would be a list of columns to include.
inplace (bool, defaultFalse) – Whether to modify the DataFrame rather than creating a new one.
>>> df=pd.DataFrame({"name":['Alfred','Batman','Catwoman'],... "toy":[np.nan,'Batmobile','Bullwhip'],... "born":[pd.NaT,pd.Timestamp("1940-04-25"),... pd.NaT]})>>> df name toy born0 Alfred NaN NaT1 Batman Batmobile 1940-04-252 Catwoman Bullwhip NaT
Drop the rows where at least one element is missing.
>>> df.dropna() name toy born1 Batman Batmobile 1940-04-25
Drop the columns where at least one element is missing.
This returns a Series with the data type of each column.
The result’s index is the original DataFrame’s columns. Columns
with mixed types are stored with the object dtype. See
the User Guide for more.
Return a Series of dtype('bool') with value True for
each aligned geometry that is within a set distance from other.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to test for
equality.
distance (float, np.array, pd.Series) – Distance(s) to test if each geometry is within. A scalar distance will be
applied to all geometries. An array or Series will be applied elementwise.
If np.array or pd.Series are used then it must have same length as the
GeoSeries.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Return a GeoSeries of geometries representing the envelope of
each geometry.
The envelope of a geometry is the bounding rectangle. That is, the
point or smallest rectangular polygon (with sides parallel to the
coordinate axes) that contains the geometry.
>>> df_multindex=pd.DataFrame({'cost':[250,150,100,150,300,220],... 'revenue':[100,250,300,200,175,225]},... index=[['Q1','Q1','Q1','Q2','Q2','Q2'],... ['A','B','C','A','B','C']])>>> df_multindex cost revenueQ1 A 250 100 B 150 250 C 100 300Q2 A 150 200 B 300 175 C 220 225
>>> df.le(df_multindex,level=1) cost revenueQ1 A True True B True True C True TrueQ2 A False True B True False C True False
Test whether two objects contain the same elements.
This function allows two Series or DataFrames to be compared against
each other to see if they have the same shape and elements. NaNs in
the same location are considered equal.
The row/column index do not need to have the same type, as long
as the values are considered equal. Corresponding columns and
index must be of the same dtype.
Parameters:
other (Series or DataFrame) – The other Series or DataFrame to be compared with the first.
Returns:
True if all elements are the same in both objects, False
otherwise.
Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise.
DataFrame.eq
Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.
testing.assert_series_equal
Raises an AssertionError if left and right are not equal. Provides an easy interface to ignore inequality in dtypes, indexes and precision among others.
DataFrames df and different_column_type have the same element
types and values, but have different types for the column labels,
which will still return True.
DataFrames df and different_data_type have different types for the
same values for their elements, and will return False even though
their column labels are the same values and types.
Evaluate a string describing operations on DataFrame columns.
Operates on columns only, not specific rows or elements. This allows
eval to run arbitrary code, which can make you vulnerable to code
injection if you pass user input to this function.
inplace (bool, defaultFalse) – If the expression contains an assignment, whether to perform the
operation inplace and mutate the existing DataFrame. Otherwise,
a new DataFrame is returned.
**kwargs – See the documentation for eval() for complete details
on the keyword arguments accepted by
query().
Returns:
The result of the evaluation or None if inplace=True.
Exactly one of com, span, halflife, or alpha must be
provided if times is not provided. If times is provided,
halflife and one of com, span or alpha may be provided.
If times is specified, a timedelta convertible unit over which an
observation decays to half its value. Only applicable to mean(),
and halflife value will not apply to the other functions.
Divide by decaying adjustment factor in beginning periods to account
for imbalance in relative weightings (viewing EWMA as a moving average).
When adjust=True (default), the EW function is calculated using weights
\(w_i = (1 - \alpha)^i\). For example, the EW moving average of the series
[\(x_0, x_1, ..., x_t\)] would be:
When ignore_na=False (default), weights are based on absolute positions.
For example, the weights of \(x_0\) and \(x_2\) used in calculating
the final weighted average of [\(x_0\), None, \(x_2\)] are
\((1-\alpha)^2\) and \(1\) if adjust=True, and
\((1-\alpha)^2\) and \(\alpha\) if adjust=False.
When ignore_na=True, weights are based
on relative positions. For example, the weights of \(x_0\) and \(x_2\)
used in calculating the final weighted average of
[\(x_0\), None, \(x_2\)] are \(1-\alpha\) and \(1\) if
adjust=True, and \(1-\alpha\) and \(\alpha\) if adjust=False.
axis ({0,1}, default0) –
If 0 or 'index', calculate across the rows.
If 1 or 'columns', calculate across the columns.
For Series this parameter is unused and defaults to 0.
Explode multi-part geometries into multiple single geometries.
Each row containing a multi-part geometry will be split into
multiple rows with single geometries, thereby increasing the vertical
size of the GeoDataFrame.
Parameters:
column (string, defaultNone) – Column to explode. In the case of a geometry column, multi-part
geometries are converted to single-part.
If None, the active geometry column is used.
ignore_index (bool, defaultFalse) – If True, the resulting index will be labelled 0, 1, …, n - 1,
ignoring index_parts.
index_parts (boolean, defaultFalse) – If True, the resulting index will be a multi-index (original
index with an additional level indicating the multiple
geometries: a new zero-based index for each single part geometry
per multi-part geometry).
Returns:
Exploded geodataframe with each single geometry
as a separate entry in the geodataframe.
Explore data in interactive map based on GeoPandas and folium/leaflet.js.
Generate an interactive leaflet map based on GeoDataFrame
Parameters:
column (str, np.array, pd.Series(defaultNone)) – The name of the dataframe column, numpy.array,
or pandas.Series to be plotted. If numpy.array or
pandas.Series are used then it must have same length as dataframe.
cmap (str, matplotlib.Colormap, branca.colormap or function(defaultNone)) –
The name of a colormap recognized by matplotlib, a list-like of colors,
matplotlib.colors.Colormap, a branca.colormap.ColorMap or
function that returns a named color or hex based on the column
value, e.g.:
defmy_colormap(value):# scalar value defined in 'column'ifvalue>1:return"green"return"red"
color (str, array-like(defaultNone)) – Named color or a list-like of colors (named or hex).
m (folium.Map(defaultNone)) – Existing map instance on which to draw the plot.
Map tileset to use. Can choose from the list supported by folium, query a
xyzservices.TileProvider by a name from xyzservices.providers,
pass xyzservices.TileProvider object or pass custom XYZ URL.
The current list of built-in providers (when xyzservices is not available):
You can pass a custom tileset to Folium by passing a Leaflet-style URL
to the tiles parameter: http://{s}.yourtiles.com/{z}/{x}/{y}.png.
Be sure to check their terms and conditions and to provide attribution with
the attr keyword.
attr (str(defaultNone)) – Map tile attribution; only required if passing custom tile URL.
tooltip (bool, str, int, list(defaultTrue)) – Display GeoDataFrame attributes when hovering over the object.
True includes all columns. False removes tooltip. Pass string or list of
strings to specify a column(s). Integer specifies first n columns to be
included. Defaults to True.
popup (bool, str, int, list(defaultFalse)) – Input GeoDataFrame attributes for object displayed when clicking.
True includes all columns. False removes popup. Pass string or list of
strings to specify a column(s). Integer specifies first n columns to be
included. Defaults to False.
highlight (bool(defaultTrue)) – Enable highlight functionality when hovering over a geometry.
categorical (bool(defaultFalse)) – If False, cmap will reflect numerical values of the
column being plotted. For non-numerical columns, this
will be set to True.
legend (bool(defaultTrue)) – Plot a legend in choropleth plots.
Ignored if no column is given.
scheme (str(defaultNone)) – Name of a choropleth classification scheme (requires mapclassify >= 2.4.0).
A mapclassify.classify() will be used
under the hood. Supported are all schemes provided by mapclassify (e.g.
'BoxPlot', 'EqualInterval', 'FisherJenks', 'FisherJenksSampled',
'HeadTailBreaks', 'JenksCaspall', 'JenksCaspallForced',
'JenksCaspallSampled', 'MaxP', 'MaximumBreaks',
'NaturalBreaks', 'Quantiles', 'Percentiles', 'StdMean',
'UserDefined'). Arguments can be passed in classification_kwds.
k (int(default5)) – Number of classes
vmin (None or float(defaultNone)) – Minimum value of cmap. If None, the minimum data value
in the column to be plotted is used.
vmax (None or float(defaultNone)) – Maximum value of cmap. If None, the maximum data value
in the column to be plotted is used.
width (pixelint or percentagestring(default: '100%')) – Width of the folium Map. If the argument
m is given explicitly, width is ignored.
height (pixelint or percentagestring(default: '100%')) – Height of the folium Map. If the argument
m is given explicitly, height is ignored.
categories (list-like) – Ordered list-like object of categories to be used for categorical plot.
classification_kwds (dict(defaultNone)) – Keyword arguments to pass to mapclassify
control_scale (bool, (defaultTrue)) – Whether to add a control scale on the map.
marker_type (str, folium.Circle, folium.CircleMarker, folium.Marker(defaultNone)) – Allowed string options are (‘marker’, ‘circle’, ‘circle_marker’). Defaults to
folium.CircleMarker.
marker_kwds (dict (default {})) –
Additional keywords to be passed to the selected marker_type, e.g.:
radiusfloat (default 2 for circle_marker and 50 for circle))
Radius of the circle, in meters (for circle) or pixels
(for circle_marker).
fillbool (default True)
Whether to fill the circle or circle_marker with color.
iconfolium.map.Icon
the folium.map.Icon object to use to render the marker.
draggablebool (default False)
Set to True to be able to drag the marker around the map.
style_kwds (dict (default {})) –
Additional style to be passed to folium style_function:
strokebool (default True)
Whether to draw stroke along the path. Set it to False to
disable borders on polygons or circles.
colorstr
Stroke color
weightint
Stroke width in pixels
opacityfloat (default 1.0)
Stroke opacity
fillboolean (default True)
Whether to fill the path with color. Set it to False to
disable filling on polygons or circles.
fillColorstr
Fill color. Defaults to the value of the color option
fillOpacityfloat (default 0.5)
Fill opacity.
style_functioncallable
Function mapping a GeoJson Feature to a style dict.
Plus all supported by folium.vector_layers.path_options(). See the
documentation of folium.features.GeoJson for details.
highlight_kwds (dict (default {})) – Style to be passed to folium highlight_function. Uses the same keywords
as style_kwds. When empty, defaults to {"fillOpacity":0.75}.
missing_kwds (dict (default {})) –
Additional style for missing values:
colorstr
Color of missing values. Defaults to None, which uses Folium’s default.
labelstr (default “NaN”)
Legend entry for missing values.
tooltip_kwds (dict (default {})) – Additional keywords to be passed to folium.features.GeoJsonTooltip,
e.g. aliases, labels, or sticky.
popup_kwds (dict (default {})) – Additional keywords to be passed to folium.features.GeoJsonPopup,
e.g. aliases or labels.
legend_kwds (dict (default {})) –
Additional keywords to be passed to the legend.
Currently supported customisation:
captionstring
Custom caption of the legend. Defaults to the column name.
Additional accepted keywords when scheme is specified:
colorbarbool (default True)
An option to control the style of the legend. If True, continuous
colorbar will be used. If False, categorical legend will be used for bins.
scalebool (default True)
Scale bins along the colorbar axis according to the bin edges (True)
or use the equal length for each bin (False)
fmtstring (default “{:.2f}”)
A formatting specification for the bin edges of the classes in the
legend. For example, to have no decimals: {"fmt":"{:.0f}"}. Applies
if colorbar=False.
labelslist-like
A list of legend labels to override the auto-generated labels.
Needs to have the same number of elements as the number of
classes (k). Applies if colorbar=False.
intervalboolean (default False)
An option to control brackets from mapclassify legend.
If True, open/closed interval brackets are shown in the legend.
Applies if colorbar=False.
max_labelsint, default 10
Maximum number of colorbar tick labels (requires branca>=0.5.0)
map_kwds (dict (default {})) – Additional keywords to be passed to folium Map,
e.g. dragging, or scrollWheelZoom.
Fill NA/NaN values by propagating the last valid observation to next valid.
Parameters:
axis ({0or'index'} for Series, {0or'index',1or'columns'} for DataFrame) – Axis along which to fill missing values. For Series
this parameter is unused and defaults to 0.
inplace (bool, defaultFalse) – If True, fill in-place. Note: this will modify any
other views on this object (e.g., a no-copy slice for a column in a
DataFrame).
limit (int, defaultNone) – If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
A dict of item->dtype of what to downcast if possible,
or the string ‘infer’ which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Deprecated since version 2.2.0.
Returns:
Object with missing values filled or None if inplace=True.
>>> df=pd.DataFrame([[np.nan,2,np.nan,0],... [3,4,np.nan,1],... [np.nan,np.nan,np.nan,np.nan],... [np.nan,3,np.nan,4]],... columns=list("ABCD"))>>> df A B C D0 NaN 2.0 NaN 0.01 3.0 4.0 NaN 1.02 NaN NaN NaN NaN3 NaN 3.0 NaN 4.0
>>> df.ffill() A B C D0 NaN 2.0 NaN 0.01 3.0 4.0 NaN 1.02 3.0 4.0 NaN 1.03 3.0 3.0 NaN 4.0
value (scalar, dict, Series, or DataFrame) – Value to use to fill holes (e.g. 0), alternately a
dict/Series/DataFrame of values specifying which value to use for
each index (for a Series) or column (for a DataFrame). Values not
in the dict/Series/DataFrame will not be filled. This value cannot
be a list.
Method to use for filling holes in reindexed Series:
ffill: propagate last valid observation forward to next valid.
backfill / bfill: use next valid observation to fill gap.
Deprecated since version 2.1.0: Use ffill or bfill instead.
axis ({0or'index'} for Series, {0or'index',1or'columns'} for DataFrame) – Axis along which to fill missing values. For Series
this parameter is unused and defaults to 0.
inplace (bool, defaultFalse) – If True, fill in-place. Note: this will modify any
other views on this object (e.g., a no-copy slice for a column in a
DataFrame).
limit (int, defaultNone) – If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
A dict of item->dtype of what to downcast if possible,
or the string ‘infer’ which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Deprecated since version 2.2.0.
Returns:
Object with missing values filled or None if inplace=True.
>>> df=pd.DataFrame([[np.nan,2,np.nan,0],... [3,4,np.nan,1],... [np.nan,np.nan,np.nan,np.nan],... [np.nan,3,np.nan,4]],... columns=list("ABCD"))>>> df A B C D0 NaN 2.0 NaN 0.01 3.0 4.0 NaN 1.02 NaN NaN NaN NaN3 NaN 3.0 NaN 4.0
Replace all NaN elements with 0s.
>>> df.fillna(0) A B C D0 0.0 2.0 0.0 0.01 3.0 4.0 0.0 1.02 0.0 0.0 0.0 0.03 0.0 3.0 0.0 4.0
Replace all NaN elements in column ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1,
2, and 3 respectively.
>>> values={"A":0,"B":1,"C":2,"D":3}>>> df.fillna(value=values) A B C D0 0.0 2.0 2.0 0.01 3.0 4.0 2.0 1.02 0.0 1.0 2.0 3.03 0.0 3.0 2.0 4.0
Only replace the first NaN element.
>>> df.fillna(value=values,limit=1) A B C D0 0.0 2.0 2.0 0.01 3.0 4.0 NaN 1.02 NaN 1.0 NaN 3.03 NaN 3.0 NaN 4.0
When filling using a DataFrame, replacement happens along
the same column names and same indices
>>> df2=pd.DataFrame(np.zeros((4,4)),columns=list("ABCE"))>>> df.fillna(df2) A B C D0 0.0 2.0 0.0 0.01 3.0 4.0 0.0 1.02 0.0 0.0 0.0 NaN3 0.0 3.0 0.0 4.0
Note that column D is not affected since it is not present in df2.
Subset the dataframe rows or columns according to the specified index labels.
Note that this routine does not filter a dataframe on its
contents. The filter is applied to the labels of the index.
Parameters:
items (list-like) – Keep labels from axis which are in items.
like (str) – Keep labels from axis for which “like in label == True”.
regex (str(regularexpression)) – Keep labels from axis for which re.search(regex, label) == True.
axis ({0or'index',1or'columns',None}, defaultNone) – The axis to filter on, expressed either as an index (int)
or axis name (str). By default this is the info axis, ‘columns’ for
DataFrame. For Series this parameter is unused and defaults to None.
Return type:
sametypeasinputobject
See also
DataFrame.loc
Access a group of rows and columns by label(s) or a boolean array.
Notes
The items, like, and regex parameters are
enforced to be mutually exclusive.
axis defaults to the info axis that is used when indexing
with [].
Examples
>>> df=pd.DataFrame(np.array(([1,2,3],[4,5,6])),... index=['mouse','rabbit'],... columns=['one','two','three'])>>> df one two threemouse 1 2 3rabbit 4 5 6
>>> # select columns by name>>> df.filter(items=['one','three']) one threemouse 1 3rabbit 4 6
>>> # select columns by regular expression>>> df.filter(regex='e$',axis=1) one threemouse 1 3rabbit 4 6
>>> # select rows containing 'bbi'>>> df.filter(like='bbi',axis=0) one two threerabbit 4 5 6
Select initial periods of time series data based on a date offset.
Deprecated since version 2.1: first() is deprecated and will be removed in a future version.
Please create a mask and filter using .loc instead.
For a DataFrame with a sorted DatetimeIndex, this function can
select the first few rows based on a date offset.
Parameters:
offset (str, DateOffset or dateutil.relativedelta) – The offset length of the data that will be selected. For instance,
‘1ME’ will display all the rows having their index within the first month.
Notice the data for 3 first calendar days were returned, not the first
3 days observed in the dataset, and therefore data for 2018-04-13 was
not returned.
Get the properties associated with this pandas object.
The available flags are
Flags.allows_duplicate_labels
See also
Flags
Flags that apply to pandas objects.
DataFrame.attrs
Global metadata applying to this dataset.
Notes
“Flags” differ from “metadata”. Flags reflect properties of the
pandas object (the Series or DataFrame). Metadata refer to properties
of the dataset, and should be stored in DataFrame.attrs.
Get Integer division of dataframe and other, element-wise (binary operator floordiv).
Equivalent to dataframe//other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rfloordiv.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Return a Series containing the Frechet distance to aligned other.
The Fréchet distance is a measure of similarity: it is the greatest distance
between any point in A and the closest point in B. The discrete distance is an
approximation of this metric: only vertices are considered. The parameter
densify makes this approximation less coarse by splitting the line segments
between vertices before computing the distance.
Fréchet distance sweep continuously along their respective curves and the
direction of curves is significant. This makes it a better measure of similarity
than Hausdorff distance for curve or surface matching.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The Geoseries (elementwise) or geometric object to find the
distance to.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
densify (float(defaultNone)) – A value between 0 and 1, that splits each subsegment of a line string
into equal length segments, making the approximation less coarse.
A densify value of 0.5 will add a point halfway between each pair of
points. A densify value of 0.25 will add a point every quarter of the way
between each pair of points.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and use elements with the same index using
align=True or ignore index and use elements based on their matching
order using align=False:
We can also set a densify value, which is a float between 0 and 1 and
signifies the fraction of the distance between each pair of points that will
be used as the distance between the points when densifying.
This functions accepts any tabular Arrow object implementing
the Arrow PyCapsule Protocol (i.e. having an __arrow_c_array__
or __arrow_c_stream__ method).
Added in version 1.0.
Parameters:
table (pyarrow.Table or Arrow-compatibletable) – Any tabular object implementing the Arrow PyCapsule Protocol
(i.e. has an __arrow_c_array__ or __arrow_c_stream__
method). This table should have at least one column with a
geoarrow geometry type.
geometry (str, defaultNone) – The name of the geometry column to set as the active geometry
column. If None, the first geometry column found will be used.
to_pandas_kwargs (dict, optional) – Arguments passed to the pa.Table.to_pandas method for non-geometry
columns. This can be used to control the behavior of the conversion of the
non-geometry columns to a pandas DataFrame. For example, you can use this
to control the dtype conversion of the columns. By default, the to_pandas
method is called with no additional arguments.
Alternate constructor to create GeoDataFrame from an iterable of
features or a feature collection.
Parameters:
features –
Iterable of features, where each element must be a feature
dictionary or implement the __geo_interface__.
Feature collection, where the ‘features’ key contains an
iterable of features.
Object holding a feature collection that implements the
__geo_interface__.
crs (str or dict(optional)) – Coordinate reference system to set on the resulting frame.
columns (list of columnnames, optional) – Optionally specify the column names to include in the output frame.
This does not overwrite the property names of the input, but can
ensure a consistent output format.
Alternate constructor to create a GeoDataFrame from a file.
It is recommended to use geopandas.read_file() instead.
Can load a GeoDataFrame from a file in any format recognized by
pyogrio. See http://pyogrio.readthedocs.io/ for details.
Parameters:
filename (str) – File path or file handle to read from. Depending on which kwargs
are included, the content of filename may vary. See
pyogrio.read_dataframe() for usage details.
kwargs (key-wordarguments) – These arguments are passed to pyogrio.read_dataframe(), and can be
used to access multi-layer data, data stored within archives (zip files),
etc.
Alternate constructor to create a GeoDataFrame from a sql query
containing a geometry column in WKB representation.
Parameters:
sql (string)
con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine)
geom_col (string, default'geom') – column name to convert to shapely geometries
crs (optional) – Coordinate reference system to use for the returned GeoDataFrame
index_col (string or list of strings, optional, default: None) – Column(s) to set as index(MultiIndex)
coerce_float (boolean, defaultTrue) – Attempt to convert values of non-string, non-numeric objects (like
decimal.Decimal) to floating point, useful for SQL result sets
Dict of {column_name:formatstring} where format string is
strftime compatible in case of parsing string times, or is one of
(D, s, ns, ms, us) in case of parsing integer timestamps.
Dict of {column_name:argdict}, where the arg dict
corresponds to the keyword arguments of
pandas.to_datetime(). Especially useful with databases
without native Datetime support, such as SQLite.
params (list, tuple or dict, optional, defaultNone) – List of parameters to pass to execute method.
chunksize (int, defaultNone) – If specified, return an iterator where chunksize is the number
of rows to include in each chunk.
Return type:
GeoDataFrame
Examples
PostGIS
>>> fromsqlalchemyimportcreate_engine>>> db_connection_url="postgresql://myusername:mypassword@myhost:5432/mydb">>> con=create_engine(db_connection_url)>>> sql="SELECT geom, highway FROM roads">>> df=geopandas.GeoDataFrame.from_postgis(sql,con)
SpatiaLite
>>> sql="SELECT ST_Binary(geom) AS geom, highway FROM roads">>> df=geopandas.GeoDataFrame.from_postgis(sql,con)
The recommended method of reading from PostGIS is
geopandas.read_postgis():
Convert structured or record ndarray to DataFrame.
Creates a DataFrame object from a structured ndarray, sequence of
tuples or dicts, or DataFrame.
Parameters:
data (structuredndarray, sequence of tuples or dicts, or DataFrame) –
Structured input data.
Deprecated since version 2.1.0: Passing a DataFrame is deprecated.
index (str, list of fields, array-like) – Field of array to use as the index, alternately a specific set of
input labels to use.
exclude (sequence, defaultNone) – Columns or fields to exclude.
columns (sequence, defaultNone) – Column names to use. If the passed data do not have names
associated with them, this argument provides names for the
columns. Otherwise this argument indicates the order of the columns
in the result (any names not found in the data will become all-NA
columns).
coerce_float (bool, defaultFalse) – Attempt to convert values of non-string, non-numeric objects (like
decimal.Decimal) to floating point, useful for SQL result sets.
nrows (int, defaultNone) – Number of rows to read if data is an iterator.
>>> df_multindex=pd.DataFrame({'cost':[250,150,100,150,300,220],... 'revenue':[100,250,300,200,175,225]},... index=[['Q1','Q1','Q1','Q2','Q2','Q2'],... ['A','B','C','A','B','C']])>>> df_multindex cost revenueQ1 A 250 100 B 150 250 C 100 300Q2 A 150 200 B 300 175 C 220 225
>>> df.le(df_multindex,level=1) cost revenueQ1 A True True B True True C True TrueQ2 A False True B True False C True False
Return a Series of dtype('bool') with value True for
each aligned geometry equal to other.
An object is said to be equal to other if its set-theoretic
boundary, interior, and exterior coincides with those of the
other.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to test for
equality.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Return True for all geometries that equal aligned other to a given
tolerance, else False.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to compare to.
tolerance (float) – Decimal place precision used when testing for approximate equality.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
Return type:
Series(bool)
Examples
>>> fromshapely.geometryimportPoint>>> s=geopandas.GeoSeries(... [... Point(0,1.1),... Point(0,1.0),... Point(0,1.2),... ]... )>>> s0 POINT (0 1.1)1 POINT (0 1)2 POINT (0 1.2)dtype: geometry
Return True for all geometries that are identical aligned other, else
False.
This function verifies whether geometries are pointwise equivalent by checking
that the structure, ordering, and values of all vertices are identical in all
dimensions.
Similarly to geom_equals_exact(), this function uses exact coordinate
equality and requires coordinates to be in the same order for all components
(vertices, rings, or parts) of a geometry. However, in contrast to
geom_equals_exact(), this function does not allow specifying specify
a tolerance, and additionally requires all dimensions to be the same
(geom_equals_exact() ignores the Z and M dimensions), where NaN values
are considered to be equal to other NaN values.
This function is the vectorized equivalent of scalar equality of geometry
objects (a==b, i.e. __eq__).
The operation works on a 1-to-1 row-wise manner:
Requires Shapely >= 2.1.
Added in version 1.1.0.
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to compare to.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
Return type:
Series(bool)
Examples
>>> fromshapely.geometryimportPoint>>> s=geopandas.GeoSeries(... [... Point(0,1.1),... Point(0,1.0),... Point(0,1.2),... ]... )>>> s0 POINT (0 1.1)1 POINT (0 1)2 POINT (0 1.2)dtype: geometry
Get coordinates from a GeoSeries as a DataFrame of
floats.
The shape of the returned DataFrame is (N, 2), with N being the
number of coordinate pairs. With the default of include_z=False,
three-dimensional data is ignored. When specifying include_z=True, the shape
of the returned DataFrame is (N, 3).
Parameters:
include_z (bool, defaultFalse) – Include Z coordinates
ignore_index (bool, defaultFalse) – If True, the resulting index will be labelled 0, 1, …, n - 1, ignoring
index_parts.
index_parts (bool, defaultFalse) – If True, the resulting index will be a MultiIndex (original
index with an additional level indicating the ordering of the coordinate
pairs: a new zero-based index for each geometry in the original GeoSeries).
include_m (bool, defaultFalse) – Include M coordinates. Requires shapely >= 2.1.
Return a Series of the precision of each geometry.
If a precision has not been previously set, it will be 0, indicating regular
double precision coordinates are in use. Otherwise, it will return the precision
grid size that was set on a geometry.
Returns NaN for not-a-geometry values.
Examples
>>> fromshapely.geometryimportPoint>>> s=geopandas.GeoSeries(... [... Point(0,1),... Point(0,1,2),... Point(0,1.5,2),... ]... )>>> s0 POINT (0 1)1 POINT Z (0 1 2)2 POINT Z (0 1.5 2)dtype: geometry
Group DataFrame using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the
object, applying a function, and combining the results. This can be
used to group large amounts of data and compute operations on these
groups.
Parameters:
by (mapping, function, label, pd.Grouper or list of such) – Used to determine the groups for the groupby.
If by is a function, it’s called on each value of the object’s
index. If a dict or Series is passed, the Series or dict VALUES
will be used to determine the groups (the Series’ values are first
aligned; see .align() method). If a list or ndarray of length
equal to the selected axis is passed (see the groupby user guide),
the values are used as-is to determine the groups. A label or list
of labels may be passed to group by the columns in self.
Notice that a tuple is interpreted as a (single) key.
axis ({0or'index',1or'columns'}, default0) –
Split along rows (0) or columns (1). For Series this parameter
is unused and defaults to 0.
Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version.
For axis=1, do frame.T.groupby(...) instead.
level (int, levelname, or sequence of such, defaultNone) – If the axis is a MultiIndex (hierarchical), group by a particular
level or levels. Do not specify both by and level.
as_index (bool, defaultTrue) – Return object with group labels as the
index. Only relevant for DataFrame input. as_index=False is
effectively “SQL-style” grouped output. This argument has no effect
on filtrations (see the filtrations in the user guide),
such as head(), tail(), nth() and in transformations
(see the transformations in the user guide).
Sort group keys. Get better performance by turning this off.
Note this does not influence the order of observations within each
group. Groupby preserves the order of rows within each group. If False,
the groups will appear in the same order as they did in the original DataFrame.
This argument has no effect on filtrations (see the filtrations in the user guide),
such as head(), tail(), nth() and in transformations
(see the transformations in the user guide).
Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no
longer sort the values.
When calling apply and the by argument produces a like-indexed
(i.e. a transform) result, add group keys to
index to identify pieces. By default group keys are not included
when the result’s index (and column) labels match the inputs, and
are included otherwise.
Changed in version 1.5.0: Warns that group_keys will no longer be ignored when the
result from apply is a like-indexed Series or DataFrame.
Specify group_keys explicitly to include the group keys or
not.
Changed in version 2.0.0: group_keys now defaults to True.
This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.
Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.
dropna (bool, defaultTrue) – If True, and if group keys contain NA values, NA values together
with row/column will be dropped.
If False, NA values will also be treated as the key in groups.
Returns:
Returns a groupby object that contains information about the groups.
Convenience method for frequency conversion and resampling of time series.
Notes
See the user guide for more
detailed usage and examples, including splitting an object into groups,
iterating through groups, selecting a group, aggregation, and more.
>>> df_multindex=pd.DataFrame({'cost':[250,150,100,150,300,220],... 'revenue':[100,250,300,200,175,225]},... index=[['Q1','Q1','Q1','Q2','Q2','Q2'],... ['A','B','C','A','B','C']])>>> df_multindex cost revenueQ1 A 250 100 B 150 250 C 100 300Q2 A 150 200 B 300 175 C 220 225
>>> df.le(df_multindex,level=1) cost revenueQ1 A True True B True True C True TrueQ2 A False True B True False C True False
Return a Series of dtype('bool') with value True for
features that have a m-component.
Requires Shapely >= 2.1.
Added in version 1.1.0.
Examples
>>> fromshapely.geometryimportPoint>>> s=geopandas.GeoSeries.from_wkt(... [... "POINT M (2 3 5)",... "POINT Z (1 2 3)",... "POINT (0 0)",... ]... )>>> s0 POINT M (2 3 5)1 POINT Z (1 2 3)2 POINT (0 0)dtype: geometry
Check the existence of the spatial index without generating it.
Use the .sindex attribute on a GeoDataFrame or GeoSeries
to generate a spatial index if it does not yet exist,
which may take considerable time based on the underlying index
implementation.
Note that the underlying spatial index may not be fully
initialized until the first use.
Return a Series containing the Hausdorff distance to aligned other.
The Hausdorff distance is the largest distance consisting of any point in self
with the nearest point in other.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The Geoseries (elementwise) or geometric object to find the
distance to.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
densify (float(defaultNone)) – A value between 0 and 1, that splits each subsegment of a line string
into equal length segments, making the approximation less coarse.
A densify value of 0.5 will add a point halfway between each pair of
points. A densify value of 0.25 will add a point a quarter of the way
between each pair of points.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and use elements with the same index using
align=True or ignore index and use elements based on their matching
order using align=False:
We can also set a densify value, which is a float between 0 and 1 and
signifies the fraction of the distance between each pair of points that will
be used as the distance between the points when densifying.
This function returns the first n rows for the object based
on position. It is useful for quickly testing if your object
has the right type of data in it.
For negative values of n, this function returns all rows except
the last |n| rows, equivalent to df[:n].
If n is larger than the number of rows, this function returns all rows.
The distances are calculated for the midpoints of the geometries in the
GeoDataFrame, and using the total bounds of the GeoDataFrame.
The Hilbert distance can be used to spatially sort GeoPandas
objects, by mapping two dimensional geometries along the Hilbert curve.
Parameters:
total_bounds (4-elementarray, optional) – The spatial extent in which the curve is constructed (used to
rescale the geometry midpoints). By default, the total bounds
of the full GeoDataFrame or GeoSeries will be computed. If known,
you can pass the total bounds to avoid this extra computation.
level (int(1-16), default16) – Determines the precision of the curve (points on the curve will
have coordinates in the range [0, 2^level - 1]).
Returns:
Series containing distance along the curve for geometry
A histogram is a representation of the distribution of data.
This function calls matplotlib.pyplot.hist(), on each series in
the DataFrame, resulting in one histogram per column.
Parameters:
data (DataFrame) – The pandas object holding the data.
column (str or sequence, optional) – If passed, will be used to limit data to a subset of columns.
by (object, optional) – If passed, then used to form histograms for separate groups.
grid (bool, defaultTrue) – Whether to show axis grid lines.
xlabelsize (int, defaultNone) – If specified changes the x-axis label size.
xrot (float, defaultNone) – Rotation of x axis labels. For example, a value of 90 displays the
x labels rotated 90 degrees clockwise.
ylabelsize (int, defaultNone) – If specified changes the y-axis label size.
yrot (float, defaultNone) – Rotation of y axis labels. For example, a value of 90 displays the
y labels rotated 90 degrees clockwise.
ax (Matplotlibaxesobject, defaultNone) – The axes to plot the histogram on.
sharex (bool, defaultTrueifaxisNoneelseFalse) – In case subplots=True, share x axis and set some x axis labels to
invisible; defaults to True if ax is None otherwise False if an ax
is passed in.
Note that passing in both an ax and sharex=True will alter all x axis
labels for all subplots in a figure.
sharey (bool, defaultFalse) – In case subplots=True, share y axis and set some y axis labels to
invisible.
figsize (tuple, optional) – The size in inches of the figure to create. Uses the value in
matplotlib.rcParams by default.
layout (tuple, optional) – Tuple of (rows, columns) for the layout of the histograms.
bins (int or sequence, default10) – Number of histogram bins to be used. If an integer is given, bins + 1
bin edges are calculated and returned. If bins is a sequence, gives
bin edges, including left edge of first bin and right edge of last
bin. In this case, bins is returned unmodified.
backend (str, defaultNone) – Backend to use instead of the backend specified in the option
plotting.backend. For instance, ‘matplotlib’. Alternatively, to
specify the plotting.backend for the whole session, set
pd.options.plotting.backend.
legend (bool, defaultFalse) – Whether to show the legend.
**kwargs – All other plotting keyword arguments to be passed to
matplotlib.pyplot.hist().
Purely integer-location based indexing for selection by position.
Deprecated since version 2.2.0: Returning a tuple from a callable is deprecated.
.iloc[] is primarily integer position based (from 0 to
length-1 of the axis), but may also be used with a boolean
array.
Allowed inputs are:
An integer, e.g. 5.
A list or array of integers, e.g. [4,3,0].
A slice object with ints, e.g. 1:7.
A boolean array.
A callable function with one argument (the calling Series or
DataFrame) and that returns valid output for indexing (one of the above).
This is useful in method chains, when you don’t have a reference to the
calling object, but would like to base your selection on
some value.
A tuple of row and column indexes. The tuple elements consist of one of the
above inputs, e.g. (0,1).
.iloc will raise IndexError if a requested indexer is
out-of-bounds, except slice indexers which allow out-of-bounds
indexing (this conforms with python/numpy slice semantics).
The index of a DataFrame is a series of labels that identify each row.
The labels can be integers, strings, or any other hashable type. The index
is used for label-based access and alignment, and can be accessed or
modified using this attribute.
In this example, we create a DataFrame with 3 rows and 3 columns,
including Name, Age, and Location information. We set the index labels to
be the integers 10, 20, and 30. We then access the index attribute of the
DataFrame, which returns an Index object containing the index labels.
>>> df.index=[100,200,300]>>> df Name Age Location100 Alice 25 Seattle200 Bob 30 New York300 Aritra 35 Kona
In this example, we modify the index labels of the DataFrame by assigning
a new list of labels to the index attribute. The DataFrame is then
updated with the new labels, and the output shows the modified DataFrame.
Attempt to infer better dtypes for object columns.
Attempts soft conversion of object-dtyped
columns, leaving non-object and unconvertible
columns unchanged. The inference rules are the
same as during normal Series/DataFrame construction.
Whether to make a copy for non-object or non-inferable columns
or Series.
Note
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
This method prints information about a DataFrame including
the index dtype and columns, non-null values and memory usage.
Parameters:
verbose (bool, optional) – Whether to print the full summary. By default, the setting in
pandas.options.display.max_info_columns is followed.
buf (writablebuffer, defaults to sys.stdout) – Where to send the output. By default, the output is printed to
sys.stdout. Pass a writable buffer if you need to further process
the output.
max_cols (int, optional) – When to switch from the verbose to the truncated output. If the
DataFrame has more than max_cols columns, the truncated output
is used. By default, the setting in
pandas.options.display.max_info_columns is used.
Specifies whether total memory usage of the DataFrame
elements (including the index) should be displayed. By default,
this follows the pandas.options.display.memory_usage setting.
True always show memory usage. False never shows memory usage.
A value of ‘deep’ is equivalent to “True with deep introspection”.
Memory usage is shown in human-readable units (base-2
representation). Without deep introspection a memory estimation is
made based in column dtype and number of rows assuming values
consume the same memory amount for corresponding dtypes. With deep
memory introspection, a real memory usage calculation is performed
at the cost of computational resources. See the
Frequently Asked Questions for more
details.
show_counts (bool, optional) – Whether to show the non-null counts. By default, this is shown
only if the DataFrame is smaller than
pandas.options.display.max_info_rows and
pandas.options.display.max_info_columns. A value of True always
shows the counts, and False never shows the counts.
Returns:
This method prints a summary of a DataFrame and returns None.
Return a point at the specified distance along each geometry.
Parameters:
distance (float or Series of floats) – Distance(s) along the geometries at which a point should be
returned. If np.array or pd.Series are used then it must have
same length as the GeoSeries.
normalized (boolean) – If normalized is True, distance will be interpreted as a fraction
of the geometric object’s length.
Return a GeoSeries of the intersection of points in each
aligned geometry with other.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (Geoseries or geometricobject) – The Geoseries (elementwise) or geometric object to find the
intersection with.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Return a Series of dtype('bool') with value True for
each aligned geometry that intersects other.
An object is said to intersect other if its boundary and interior
intersects in any way with those of the other.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to test if is
intersected.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Return a GeoSeries containing edges causing invalid polygonal coverage.
This method returns (Multi)LineStrings showing the location of edges violating
polygonal coverage (if any) in each polygon in the input GeoSeries.
A GeoSeries of valid polygons is considered a coverage if the polygons are:
Non-overlapping - polygons do not overlap (their interiors do not
intersect)
Edge-Matched - vertices along shared edges are identical
A valid coverage may contain holes (regions of no coverage). However, sometimes
it might be desirable to detect narrow gaps as invalidities in the coverage. The
gap_width parameter allows to specify the maximum width of gaps to detect.
When gaps are detected, the is_valid_coverage() method will return
False and this method can be used to find the edges of those gaps.
Geometries that are not Polygon or MultiPolygon are ignored.
Requires Shapely >= 2.1.
Added in version 1.1.0.
Parameters:
gap_width (float, optional) – The maximum width of gaps to detect, by default 0.0
Return a Series of dtype('bool') with value True
if a LineString or LinearRing is counterclockwise.
Note that there are no checks on whether lines are actually
closed and not self-intersecting, while this is a requirement
for is_ccw. The recommended usage of this property for
LineStrings is GeoSeries.is_ccw&GeoSeries.is_simple and for
LinearRings GeoSeries.is_ccw&GeoSeries.is_valid.
This property will return False for non-linear geometries and for
lines with fewer than 4 points (including the closing point).
Return a Series of dtype('bool') with value True for
features that are closed.
When constructing a LinearRing, the sequence of coordinates may be
explicitly closed by passing identical values in the first and last indices.
Otherwise, the sequence will be implicitly closed by copying the first tuple
to the last index.
Return a bool indicating whether a GeoSeries forms a valid coverage.
A GeoSeries of valid polygons is considered a coverage if the polygons are:
Non-overlapping - polygons do not overlap (their interiors do not
intersect)
Edge-Matched - vertices along shared edges are identical
A valid coverage may contain holes (regions of no coverage). However, sometimes
it might be desirable to detect narrow gaps as invalidities in the coverage. The
gap_width parameter allows to specify the maximum width of gaps to detect.
When gaps are detected, this method will return False and the
coverage_invalid_edges() method can be used to find the edges of those
gaps.
Geometries that are not Polygon or MultiPolygon are ignored and an empty
LineString is returned.
Requires Shapely >= 2.1.
Added in version 1.1.0.
Parameters:
gap_width (float, optional) – The maximum width of gaps to detect, by default 0.0
frame.isetitem(loc,value) is an in-place method as it will
modify the DataFrame in place (not returning a new object). In contrast to
frame.iloc[:,i]=value which will try to update the existing values in
place, frame.isetitem(loc,value) will not update the values of the column
itself in place, it will instead insert a new array.
In cases where frame.columns is unique, this is equivalent to
frame[frame.columns[i]]=value.
Whether each element in the DataFrame is contained in values.
Parameters:
values (iterable, Series, DataFrame or dict) – The result will only be true at a location if all the
labels match. If values is a Series, that’s the index. If
values is a dict, the keys must be the column names,
which must match. If values is a DataFrame,
then both the index and column labels must match.
Returns:
DataFrame of booleans showing whether each element in the DataFrame
is contained in values.
Return type:
DataFrame
See also
DataFrame.eq
Equality test for DataFrame.
Series.isin
Equivalent method on Series.
Series.str.contains
Test if pattern or regex is contained within a string of a Series or Index.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or numpy.NaN, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings '' or numpy.inf are not considered NA values
(unless you set pandas.options.mode.use_inf_as_na=True).
Returns:
Mask of bool values for each element in DataFrame that
indicates whether an element is an NA value.
>>> df=pd.DataFrame(dict(age=[5,6,np.nan],... born=[pd.NaT,pd.Timestamp('1939-05-27'),... pd.Timestamp('1940-04-25')],... name=['Alfred','Batman',''],... toy=[None,'Batmobile','Joker']))>>> df age born name toy0 5.0 NaT Alfred None1 6.0 1939-05-27 Batman Batmobile2 NaN 1940-04-25 Joker
>>> df.isna() age born name toy0 False True False True1 False False False False2 True False False False
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or numpy.NaN, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings '' or numpy.inf are not considered NA values
(unless you set pandas.options.mode.use_inf_as_na=True).
Returns:
Mask of bool values for each element in DataFrame that
indicates whether an element is an NA value.
>>> df=pd.DataFrame(dict(age=[5,6,np.nan],... born=[pd.NaT,pd.Timestamp('1939-05-27'),... pd.Timestamp('1940-04-25')],... name=['Alfred','Batman',''],... toy=[None,'Batmobile','Joker']))>>> df age born name toy0 5.0 NaT Alfred None1 6.0 1939-05-27 Batman Batmobile2 NaN 1940-04-25 Joker
>>> df.isna() age born name toy0 False True False True1 False False False False2 True False False False
Options are {‘null’, ‘drop’, ‘keep’}, default ‘null’.
Indicates how to output missing (NaN) values in the GeoDataFrame
null: output the missing entries as JSON null
drop: remove the property from the feature. This applies to each feature individually so that features may have different properties
keep: output the missing entries as NaN
show_bbox (bool, optional) – Include bbox (bounds) in the geojson. Default False.
drop_id (bool, default: False) – Whether to retain the index of the GeoDataFrame as the id property
in the generated GeoJSON. Default is False, but may want True
if the index is just arbitrary row numbers.
Examples
>>> fromshapely.geometryimportPoint>>> d={'col1':['name1','name2'],'geometry':[Point(1,2),Point(2,1)]}>>> gdf=geopandas.GeoDataFrame(d,crs="EPSG:4326")>>> gdf col1 geometry0 name1 POINT (1 2)1 name2 POINT (2 1)
Iterate over DataFrame rows as namedtuples of the values.
DataFrame.items
Iterate over (column name, Series) pairs.
Notes
Because iterrows returns a Series for each row,
it does not preserve dtypes across the rows (dtypes are
preserved across columns for DataFrames).
To preserve dtypes while iterating over the rows, it is better
to use itertuples() which returns namedtuples of the values
and which is generally faster than iterrows.
You should never modify something you are iterating over.
This is not guaranteed to work in all cases. Depending on the
data types, the iterator returns a copy and not a view, and writing
to it will have no effect.
index (bool, defaultTrue) – If True, return the index as the first element of the tuple.
name (str or None, default"Pandas") – The name of the returned namedtuples or None to return regular
tuples.
Returns:
An object to iterate over namedtuples for each row in the
DataFrame with the first field possibly being the index and
following fields being the column values.
Return type:
iterator
See also
DataFrame.iterrows
Iterate over DataFrame rows as (index, Series) pairs.
DataFrame.items
Iterate over (column name, Series) pairs.
Notes
The column names will be renamed to positional names if they are
invalid Python identifiers, repeated, or start with an underscore.
Join columns with other DataFrame either on index or on a key
column. Efficiently join multiple DataFrame objects by index at once by
passing a list.
Parameters:
other (DataFrame, Series, or alistcontaininganycombination of them) – Index should be similar to one of the columns in this one. If a
Series is passed, its name attribute must be set, and that will be
used as the column name in the resulting joined DataFrame.
on (str, list of str, or array-like, optional) – Column or index level name(s) in the caller to join on the index
in other, otherwise joins index-on-index. If multiple
values given, the other DataFrame must have a MultiIndex. Can
pass an array as the join key if it is not already contained in
the calling DataFrame. Like an Excel VLOOKUP operation.
how ({'left','right','outer','inner','cross'}, default'left') –
How to handle the operation of the two objects.
left: use calling frame’s index (or column if on is specified)
right: use other’s index.
outer: form union of calling frame’s index (or column if on is
specified) with other’s index, and sort it lexicographically.
inner: form intersection of calling frame’s index (or column if
on is specified) with other’s index, preserving the order
of the calling’s one.
cross: creates the cartesian product from both frames, preserves the order
of the left keys.
lsuffix (str, default'') – Suffix to use from left frame’s overlapping columns.
rsuffix (str, default'') – Suffix to use from right frame’s overlapping columns.
sort (bool, defaultFalse) – Order result DataFrame lexicographically by the join key. If False,
the order of the join key depends on the join type (how keyword).
>>> df.join(other,lsuffix='_caller',rsuffix='_other') key_caller A key_other B0 K0 A0 K0 B01 K1 A1 K1 B12 K2 A2 K2 B23 K3 A3 NaN NaN4 K4 A4 NaN NaN5 K5 A5 NaN NaN
If we want to join using the key columns, we need to set key to be
the index in both df and other. The joined DataFrame will have
key as its index.
>>> df.set_index('key').join(other.set_index('key')) A BkeyK0 A0 B0K1 A1 B1K2 A2 B2K3 A3 NaNK4 A4 NaNK5 A5 NaN
Another option to join using the key columns is to use the on
parameter. DataFrame.join always uses other’s index but we can use
any column in df. This method preserves the original DataFrame’s
index in the result.
Select final periods of time series data based on a date offset.
Deprecated since version 2.1: last() is deprecated and will be removed in a future version.
Please create a mask and filter using .loc instead.
For a DataFrame with a sorted DatetimeIndex, this function
selects the last few rows based on a date offset.
Parameters:
offset (str, DateOffset, dateutil.relativedelta) – The offset length of the data that will be selected. For instance,
‘3D’ will display all the rows having their index within the last 3 days.
Notice the data for 3 last calendar days were returned, not the last
3 observed days in the dataset, and therefore data for 2018-04-11 was
not returned.
>>> df_multindex=pd.DataFrame({'cost':[250,150,100,150,300,220],... 'revenue':[100,250,300,200,175,225]},... index=[['Q1','Q1','Q1','Q2','Q2','Q2'],... ['A','B','C','A','B','C']])>>> df_multindex cost revenueQ1 A 250 100 B 150 250 C 100 300Q2 A 150 200 B 300 175 C 220 225
>>> df.le(df_multindex,level=1) cost revenueQ1 A True True B True True C True TrueQ2 A False True B True False C True False
Length may be invalid for a geographic CRS using degrees as units;
use GeoSeries.to_crs() to project geometries to a planar
CRS before using this function.
Every operation in GeoPandas is planar, i.e. the potential third
dimension is not taken into account.
Return (Multi)LineStrings formed by combining the lines in a
MultiLineString.
Lines are joined together at their endpoints in case two lines are intersecting.
Lines are not joined when 3 or more lines are intersecting at the endpoints.
Line elements that cannot be joined are kept as is in the resulting
MultiLineString.
The direction of each merged LineString will be that of the majority of the
LineStrings from which it was derived. Except if directed=True is specified,
then the operation will not change the order of points within lines and so only
lines which can be joined with no change in direction are merged.
Non-linear geometeries result in an empty GeometryCollection.
Parameters:
directed (bool, defaultFalse) – Only combine lines if possible without changing point order.
Requires GEOS >= 3.11.0
Slice with integer labels for rows. As mentioned above, note that both
the start and stop of the slice are included.
>>> df.loc[7:9] max_speed shield7 1 28 4 59 7 8
Getting values with a MultiIndex
A number of examples using a DataFrame with a MultiIndex
>>> tuples=[... ('cobra','mark i'),('cobra','mark ii'),... ('sidewinder','mark i'),('sidewinder','mark ii'),... ('viper','mark ii'),('viper','mark iii')... ]>>> index=pd.MultiIndex.from_tuples(tuples)>>> values=[[12,2],[0,4],[10,20],... [1,4],[7,1],[16,36]]>>> df=pd.DataFrame(values,columns=['max_speed','shield'],index=index)>>> df max_speed shieldcobra mark i 12 2 mark ii 0 4sidewinder mark i 10 20 mark ii 1 4viper mark ii 7 1 mark iii 16 36
Single label. Note this returns a DataFrame with a single index.
>>> df.loc['cobra'] max_speed shieldmark i 12 2mark ii 0 4
Single index tuple. Note this returns a Series.
>>> df.loc[('cobra','mark ii')]max_speed 0shield 4Name: (cobra, mark ii), dtype: int64
Single label for row and column. Similar to passing in a tuple, this
returns a Series.
>>> df.loc['cobra','mark i']max_speed 12shield 2Name: (cobra, mark i), dtype: int64
Single tuple. Note using [[]] returns a DataFrame.
>>> df.loc[[('cobra','mark ii')]] max_speed shieldcobra mark ii 0 4
Single tuple for the index with a single label for the column
>>> df.loc[('cobra','mark i'),'shield']2
Slice from index tuple to single label
>>> df.loc[('cobra','mark i'):'viper'] max_speed shieldcobra mark i 12 2 mark ii 0 4sidewinder mark i 10 20 mark ii 1 4viper mark ii 7 1 mark iii 16 36
Slice from index tuple to index tuple
>>> df.loc[('cobra','mark i'):('viper','mark ii')] max_speed shieldcobra mark i 12 2 mark ii 0 4sidewinder mark i 10 20 mark ii 1 4viper mark ii 7 1
Please see the user guide
for more details and explanations of advanced indexing.
>>> df_multindex=pd.DataFrame({'cost':[250,150,100,150,300,220],... 'revenue':[100,250,300,200,175,225]},... index=[['Q1','Q1','Q1','Q2','Q2','Q2'],... ['A','B','C','A','B','C']])>>> df_multindex cost revenueQ1 A 250 100 B 150 250 C 100 300Q2 A 150 200 B 300 175 C 220 225
>>> df.le(df_multindex,level=1) cost revenueQ1 A True True B True True C True TrueQ2 A False True B True False C True False
If the input geometry is already valid, then it will be preserved.
In many cases, in order to create a valid geometry, the input
geometry must be split into multiple parts or multiple geometries.
If the geometry must be split into multiple parts of the same type
to be made valid, then a multi-part geometry will be returned
(e.g. a MultiPolygon).
If the geometry must be split into multiple parts of different types
to be made valid, then a GeometryCollection will be returned.
Two methods are available:
the ‘linework’ algorithm tries to preserve every edge and vertex in
the input. It combines all rings into a set of noded lines and then
extracts valid polygons from that linework. An alternating even-odd
strategy is used to assign areas as interior or exterior. A
disadvantage is that for some relatively simple invalid geometries
this produces rather complex results.
the ‘structure’ algorithm tries to reason from the structure of the
input to find the ‘correct’ repair: exterior rings bound area,
interior holes exclude area. It first makes all rings valid, then
shells are merged and holes are subtracted from the shells to
generate valid result. It assumes that holes and shells are correctly
categorized in the input geometry.
For the ‘structure’ method, True will keep components that have
collapsed into a lower dimensionality. For example, a ring
collapsing to a line, or a line collapsing to a point. Must be True
for the ‘linework’ method.
cond (boolSeries/DataFrame, array-like, or callable) – Where cond is False, keep the original value. Where
True, replace with corresponding value from other.
If cond is callable, it is computed on the Series/DataFrame and
should return boolean Series/DataFrame or array. The callable must
not change input Series/DataFrame (though pandas doesn’t check it).
other (scalar, Series/DataFrame, or callable) – Entries where cond is True are replaced with
corresponding value from other.
If other is callable, it is computed on the Series/DataFrame and
should return scalar or Series/DataFrame. The callable must not
change input Series/DataFrame (though pandas doesn’t check it).
If not specified, entries will be filled with the corresponding
NULL value (np.nan for numpy dtypes, pd.NA for extension
dtypes).
inplace (bool, defaultFalse) – Whether to perform the operation in place on the data.
axis (int, defaultNone) – Alignment axis if needed. For Series this parameter is
unused and defaults to 0.
level (int, defaultNone) – Alignment level if needed.
Return type:
Sametypeascaller or Noneif``inplace=True`.`
See also
DataFrame.where()
Return an object of same shape as self.
Notes
The mask method is an application of the if-then idiom. For each
element in the calling DataFrame, if cond is False the
element is used; otherwise the corresponding element from the DataFrame
other is used. If the axis of other does not align with axis of
cond Series/DataFrame, the misaligned index positions will be filled with
True.
The signature for DataFrame.where() differs from
numpy.where(). Roughly df1.where(m,df2) is equivalent to
np.where(m,df1,df2).
For further details and examples see the mask documentation in
indexing.
The dtype of the object takes precedence. The fill value is casted to
the object’s dtype, if this can be done losslessly.
Return a GeoSeries of geometries representing the largest circle that
is fully contained within the input geometry.
Constructs the “maximum inscribed circle” (MIC) for a polygonal geometry, up to
a specified tolerance. The MIC is determined by a point in the interior of the
area which has the farthest distance from the area boundary, along with a
boundary point at that distance. In the context of geography the center of the
MIC is known as the “pole of inaccessibility”. A cartographic use case is to
determine a suitable point to place a map label within a polygon. The radius
length of the MIC is a measure of how “narrow” a polygon is. It is the distance
at which the negative buffer becomes empty.
The method supports polygons with holes and multipolygons but will raise an
error for any other geometry type.
Returns a GeoSeries with two-point linestrings rows, with the first point at the
center of the inscribed circle and the second on the boundary of the inscribed
circle.
Requires Shapely >= 2.1.
Added in version 1.1.0.
Parameters:
tolerance (float, np.array, pd.Series) – Stop the algorithm when the search area is smaller than this tolerance.
When not specified, uses max(width,height)/1000 per geometry as the
default. If np.array or pd.Series are used then it must have same length as
the GeoSeries.
Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (id_vars), while all other
columns, considered measured variables (value_vars), are “unpivoted” to
the row axis, leaving just two non-identifier columns, ‘variable’ and
‘value’.
Parameters:
id_vars (scalar, tuple, list, or ndarray, optional) – Column(s) to use as identifier variables.
value_vars (scalar, tuple, list, or ndarray, optional) – Column(s) to unpivot. If not specified, uses all columns that
are not set as id_vars.
var_name (scalar, defaultNone) – Name to use for the ‘variable’ column. If None it uses
frame.columns.name or ‘variable’.
value_name (scalar, default'value') – Name to use for the ‘value’ column, can’t be an existing column label.
col_level (scalar, optional) – If columns are a MultiIndex then use this level to melt.
ignore_index (bool, defaultTrue) – If True, original index is ignored. If False, the original index is retained.
Index labels will be repeated as necessary.
The memory usage can optionally include the contribution of
the index and elements of object dtype.
This value is displayed in DataFrame.info by default. This can be
suppressed by setting pandas.options.display.memory_usage to False.
Parameters:
index (bool, defaultTrue) – Specifies whether to include the memory usage of the DataFrame’s
index in returned Series. If index=True, the memory usage of
the index is the first item in the output.
deep (bool, defaultFalse) – If True, introspect the data deeply by interrogating
object dtypes for system-level memory consumption, and include
it in the returned values.
Returns:
A Series whose index is the original column names and whose values
is the memory usage of each column in bytes.
Merge DataFrame or named Series objects with a database-style join.
A named Series object is treated as a DataFrame with a single named column.
The join is done on columns or indexes. If joining columns on
columns, the DataFrame indexes will be ignored. Otherwise if joining indexes
on indexes or indexes on a column or columns, the index will be passed on.
When performing a cross merge, no column specifications to merge on are
allowed.
Warning
If both key columns contain rows where the key is a null value, those
rows will be matched against each other. This is different from usual SQL
join behaviour and can lead to unexpected results.
Parameters:
right (DataFrame or namedSeries) – Object to merge with.
how ({'left','right','outer','inner','cross'}, default'inner') –
Type of merge to be performed.
left: use only keys from left frame, similar to a SQL left outer join;
preserve key order.
right: use only keys from right frame, similar to a SQL right outer join;
preserve key order.
outer: use union of keys from both frames, similar to a SQL full outer
join; sort keys lexicographically.
inner: use intersection of keys from both frames, similar to a SQL inner
join; preserve the order of the left keys.
cross: creates the cartesian product from both frames, preserves the order
of the left keys.
on (label or list) – Column or index level names to join on. These must be found in both
DataFrames. If on is None and not merging on indexes then this defaults
to the intersection of the columns in both DataFrames.
left_on (label or list, or array-like) – Column or index level names to join on in the left DataFrame. Can also
be an array or list of arrays of the length of the left DataFrame.
These arrays are treated as if they are columns.
right_on (label or list, or array-like) – Column or index level names to join on in the right DataFrame. Can also
be an array or list of arrays of the length of the right DataFrame.
These arrays are treated as if they are columns.
left_index (bool, defaultFalse) – Use the index from the left DataFrame as the join key(s). If it is a
MultiIndex, the number of keys in the other DataFrame (either the index
or a number of columns) must match the number of levels.
right_index (bool, defaultFalse) – Use the index from the right DataFrame as the join key. Same caveats as
left_index.
sort (bool, defaultFalse) – Sort the join keys lexicographically in the result DataFrame. If False,
the order of the join keys depends on the join type (how keyword).
suffixes (list-like, defaultis(``”_x”,``"_y")) – A length-2 sequence where each element is optionally a string
indicating the suffix to add to overlapping column names in
left and right respectively. Pass a value of None instead
of a string to indicate that the column name from left or
right should be left as-is, with no suffix. At least one of the
values must not be None.
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
indicator (bool or str, defaultFalse) – If True, adds a column to the output DataFrame called “_merge” with
information on the source of each row. The column can be given a different
name by providing a string argument. The column will have a Categorical
type with the value of “left_only” for observations whose merge key only
appears in the left DataFrame, “right_only” for observations
whose merge key only appears in the right DataFrame, and “both”
if the observation’s merge key is found in both DataFrames.
Return a Series containing the minimum clearance distance,
which is the smallest distance by which a vertex of the geometry
could be moved to produce an invalid geometry.
If no minimum clearance exists for a geometry (for example,
a single point, or an empty geometry), infinity is returned.
Return a GeoSeries of the general minimum bounding rectangle
that contains the object.
Unlike envelope this rectangle is not constrained to be parallel
to the coordinate axes. If the convex hull of the object is a
degenerate (line or point) this degenerate is returned.
Get Modulo of dataframe and other, element-wise (binary operator mod).
Equivalent to dataframe%other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rmod.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Get the mode(s) of each element along the selected axis.
The mode of a set of values is the value that appears most often.
It can be multiple values.
Parameters:
axis ({0or'index',1or'columns'}, default0) –
The axis to iterate over while searching for the mode:
0 or ‘index’ : get mode of each column
1 or ‘columns’ : get mode of each row.
numeric_only (bool, defaultFalse) – If True, only apply to numeric columns.
dropna (bool, defaultTrue) – Don’t consider counts of NaN/NaT.
Returns:
The modes of each column or row.
Return type:
DataFrame
See also
Series.mode
Return the highest frequency value in a Series.
Series.value_counts
Return the counts of values in a Series.
Examples
>>> df=pd.DataFrame([('bird',2,2),... ('mammal',4,np.nan),... ('arthropod',8,0),... ('bird',2,np.nan)],... index=('falcon','horse','spider','ostrich'),... columns=('species','legs','wings'))>>> df species legs wingsfalcon bird 2 2.0horse mammal 4 NaNspider arthropod 8 0.0ostrich bird 2 NaN
By default, missing values are not considered, and the mode of wings
are both 0 and 2. Because the resulting DataFrame has two rows,
the second row of species and legs contains NaN.
>>> df.mode() species legs wings0 bird 2.0 0.01 NaN NaN 2.0
Setting dropna=FalseNaN values are considered and they can be
the mode (like for wings).
>>> df.mode(dropna=False) species legs wings0 bird 2 NaN
Setting numeric_only=True, only the mode of numeric columns is
computed, and columns of other types are ignored.
>>> df.mode(numeric_only=True) legs wings0 2.0 0.01 NaN 2.0
To compute the mode over columns and not rows, use the axis parameter:
Get Multiplication of dataframe and other, element-wise (binary operator mul).
Equivalent to dataframe*other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rmul.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Get Multiplication of dataframe and other, element-wise (binary operator mul).
Equivalent to dataframe*other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rmul.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
>>> df_multindex=pd.DataFrame({'cost':[250,150,100,150,300,220],... 'revenue':[100,250,300,200,175,225]},... index=[['Q1','Q1','Q1','Q2','Q2','Q2'],... ['A','B','C','A','B','C']])>>> df_multindex cost revenueQ1 A 250 100 B 150 250 C 100 300Q2 A 150 200 B 300 175 C 220 225
>>> df.le(df_multindex,level=1) cost revenueQ1 A True True B True True C True TrueQ2 A False True B True False C True False
Return the first n rows ordered by columns in descending order.
Return the first n rows with the largest values in columns, in
descending order. The columns that are not specified are returned as
well, but not used for ordering.
This method is equivalent to
df.sort_values(columns,ascending=False).head(n), but more
performant.
Return a GeoSeries of normalized
geometries to normal form (or canonical form).
This method orders the coordinates, rings of a polygon and parts of
multi geometries consistently. Typically useful for testing purposes
(for example in combination with equals_exact).
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings '' or numpy.inf are not considered NA values
(unless you set pandas.options.mode.use_inf_as_na=True).
NA values, such as None or numpy.NaN, get mapped to False
values.
Returns:
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
>>> df=pd.DataFrame(dict(age=[5,6,np.nan],... born=[pd.NaT,pd.Timestamp('1939-05-27'),... pd.Timestamp('1940-04-25')],... name=['Alfred','Batman',''],... toy=[None,'Batmobile','Joker']))>>> df age born name toy0 5.0 NaT Alfred None1 6.0 1939-05-27 Batman Batmobile2 NaN 1940-04-25 Joker
>>> df.notna() age born name toy0 True False True False1 True True True True2 False True True True
DataFrame.notnull is an alias for DataFrame.notna.
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings '' or numpy.inf are not considered NA values
(unless you set pandas.options.mode.use_inf_as_na=True).
NA values, such as None or numpy.NaN, get mapped to False
values.
Returns:
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
>>> df=pd.DataFrame(dict(age=[5,6,np.nan],... born=[pd.NaT,pd.Timestamp('1939-05-27'),... pd.Timestamp('1940-04-25')],... name=['Alfred','Batman',''],... toy=[None,'Batmobile','Joker']))>>> df age born name toy0 5.0 NaT Alfred None1 6.0 1939-05-27 Batman Batmobile2 NaN 1940-04-25 Joker
>>> df.notna() age born name toy0 True False True False1 True True True True2 False True True True
Return the first n rows ordered by columns in ascending order.
Return the first n rows with the smallest values in columns, in
ascending order. The columns that are not specified are returned as
well, but not used for ordering.
This method is equivalent to
df.sort_values(columns,ascending=True).head(n), but more
performant.
Return a LineString or MultiLineString geometry at a
distance from the object on its right or its left side.
Parameters:
distance (float|array-like) – Specifies the offset distance from the input geometry. Negative
for right side offset, positive for left side offset.
quad_segs (int(optional, default8)) – Specifies the number of linear segments in a quarter circle in the
approximation of circular arcs.
join_style ({'round','bevel','mitre'}, (optional, default'round')) – Specifies the shape of outside corners. ‘round’ results in
rounded shapes. ‘bevel’ results in a beveled edge that touches the
original vertex. ‘mitre’ results in a single vertex that is beveled
depending on the mitre_limit parameter.
mitre_limit (float(optional, default5.0)) – Crops of ‘mitre’-style joins if the point is displaced from the
buffered vertex by more than this limit.
Return a GeoSeries of geometries with enforced ring orientation.
Enforce a ring orientation on all polygonal elements in the GeoSeries.
Forces (Multi)Polygons to use a counter-clockwise orientation for their exterior
ring, and a clockwise orientation for their interior rings (or the oppposite if
exterior_cw=True).
Also processes geometries inside a GeometryCollection in the same way. Other
geometries are returned unchanged.
Requires Shapely >= 2.1.
Added in version 1.1.0.
Parameters:
exterior_cw (bool) – If True, exterior rings will be clockwise and interior rings will be
counter-clockwise.
Return True for all aligned geometries that overlap other, else False.
Geometries overlaps if they have more than one but not all
points in common, have the same dimension, and the intersection of the
interiors of the geometries has the same dimension as the geometries
themselves.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to test if
overlaps.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Currently only supports data GeoDataFrames with uniform geometry types,
i.e. containing only (Multi)Polygons, or only (Multi)Points, or a
combination of (Multi)LineString and LinearRing shapes.
Implements several methods that are all effectively subsets of the union.
See the User Guide page ../../user_guide/set_operations for details.
how (string) – Method of spatial overlay: ‘intersection’, ‘union’,
‘identity’, ‘symmetric_difference’ or ‘difference’.
keep_geom_type (bool) – If True, return only geometries of the same geometry type the GeoDataFrame
has, if False, return all resulting geometries. Default is None,
which will set keep_geom_type to True but warn upon dropping
geometries.
make_valid (bool, defaultTrue) – If True, any invalid input geometries are corrected with a call to
make_valid(), if False, a ValueError is raised if any input geometries
are invalid.
Returns:
df – GeoDataFrame with new set of polygons and attributes
resulting from the overlay
Fractional change between the current and a prior element.
Computes the fractional change from the immediately previous row by
default. This is useful in comparing the fraction of change in a time
series of elements.
Note
Despite the name of this method, it calculates fractional change
(also known as per unit change or relative change) and not
percentage change. If you need the percentage change, multiply
these values by 100.
Parameters:
periods (int, default1) – Periods to shift for forming percent change.
Apply chainable functions that expect Series or DataFrames.
Parameters:
func (function) – Function to apply to the Series/DataFrame.
args, and kwargs are passed into func.
Alternatively a (callable,data_keyword) tuple where
data_keyword is a string indicating the keyword of
callable that expects the Series/DataFrame.
*args (iterable, optional) – Positional arguments passed into func.
**kwargs (mapping, optional) – A dictionary of keyword arguments passed into func.
Return type:
thereturntype of func.
See also
DataFrame.apply
Apply a function along input axis of DataFrame.
DataFrame.map
Apply a function elementwise on a whole DataFrame.
If you have a function that takes the data as (say) the second
argument, pass a tuple indicating which keyword expects the
data. For example, suppose national_insurance takes its data as df
in the second argument:
Return reshaped DataFrame organized by given index / column values.
Reshape data (produce a “pivot” table) based on column values. Uses
unique values from specified index / columns to form axes of the
resulting DataFrame. This function does not support data
aggregation, multiple values will result in a MultiIndex in the
columns. See the User Guide for more on reshaping.
Parameters:
columns (str or object or alist of str) – Column to use to make new frame’s columns.
index (str or object or alist of str, optional) – Column to use to make new frame’s index. If not given, uses existing index.
values (str, object or alist of theprevious, optional) – Column(s) to use for populating new frame’s values. If not
specified, all remaining columns will be used and the result will
have hierarchically indexed columns.
Returns:
Returns reshaped DataFrame.
Return type:
DataFrame
Raises:
ValueError: – When there are any index, columns combinations with multiple
values. DataFrame.pivot_table when you need to aggregate.
See also
DataFrame.pivot_table
Generalization of pivot that can handle duplicate values for one index/column pair.
DataFrame.unstack
Pivot based on the index values instead of a column.
wide_to_long
Wide panel to long format. Less flexible but more user-friendly than melt.
Notes
For finer-tuned control, see hierarchical indexing documentation along
with the related stack/unstack methods.
>>> df=pd.DataFrame({'foo':['one','one','one','two','two',... 'two'],... 'bar':['A','B','C','A','B','C'],... 'baz':[1,2,3,4,5,6],... 'zoo':['x','y','z','q','w','t']})>>> df foo bar baz zoo0 one A 1 x1 one B 2 y2 one C 3 z3 two A 4 q4 two B 5 w5 two C 6 t
>>> df.pivot(index='foo',columns='bar',values='baz')bar A B Cfooone 1 2 3two 4 5 6
>>> df.pivot(index='foo',columns='bar')['baz']bar A B Cfooone 1 2 3two 4 5 6
>>> df.pivot(index='foo',columns='bar',values=['baz','zoo']) baz zoobar A B C A B Cfooone 1 2 3 x y ztwo 4 5 6 q w t
You could also assign a list of column names or a list of index names.
>>> df.pivot(index=["lev1","lev2"],columns=["lev3"],values="values") lev3 1 2lev1 lev2 1 1 0.0 1.0 2 2.0 NaN 2 1 4.0 3.0 2 NaN 5.0
A ValueError is raised if there are any duplicates.
>>> df=pd.DataFrame({"foo":['one','one','two','two'],... "bar":['A','A','B','C'],... "baz":[1,2,3,4]})>>> df foo bar baz0 one A 11 one A 22 two B 33 two C 4
Notice that the first two rows are the same for our index
and columns arguments.
Create a spreadsheet-style pivot table as a DataFrame.
The levels in the pivot table will be stored in MultiIndex objects
(hierarchical indexes) on the index and columns of the result DataFrame.
Parameters:
values (list-like or scalar, optional) – Column or columns to aggregate.
index (column, Grouper, array, or list of theprevious) – Keys to group by on the pivot table index. If a list is passed,
it can contain any of the other types (except list). If an array is
passed, it must be the same length as the data and will be used in
the same manner as column values.
columns (column, Grouper, array, or list of theprevious) – Keys to group by on the pivot table column. If a list is passed,
it can contain any of the other types (except list). If an array is
passed, it must be the same length as the data and will be used in
the same manner as column values.
aggfunc (function, list of functions, dict, default"mean") – If a list of functions is passed, the resulting pivot table will have
hierarchical columns whose top level are the function names
(inferred from the function objects themselves).
If a dict is passed, the key is column to aggregate and the value is
function or list of functions. If margin=True, aggfunc will be
used to calculate the partial aggregates.
fill_value (scalar, defaultNone) – Value to replace missing values with (in the resulting pivot table,
after aggregation).
margins (bool, defaultFalse) – If margins=True, special All columns and rows
will be added with partial group aggregates across the categories
on the rows and columns.
dropna (bool, defaultTrue) – Do not include columns whose entries are all NaN. If True,
rows with a NaN value in any column will be omitted before
computing margins.
margins_name (str, default'All') – Name of the row / column that will contain the totals
when margins is True.
This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.
Deprecated since version 2.2.0: The default value of False is deprecated and will change to
True in a future version of pandas.
>>> df=pd.DataFrame({"A":["foo","foo","foo","foo","foo",... "bar","bar","bar","bar"],... "B":["one","one","one","two","two",... "one","one","two","two"],... "C":["small","large","large","small",... "small","large","small","small",... "large"],... "D":[1,2,2,3,3,4,5,6,7],... "E":[2,4,5,5,6,6,8,9,9]})>>> df A B C D E0 foo one small 1 21 foo one large 2 42 foo one large 2 53 foo two small 3 54 foo two small 3 65 bar one large 4 66 bar one small 5 87 bar two small 6 98 bar two large 7 9
This first example aggregates values by taking the sum.
>>> table=pd.pivot_table(df,values='D',index=['A','B'],... columns=['C'],aggfunc="sum")>>> tableC large smallA Bbar one 4.0 5.0 two 7.0 6.0foo one 4.0 1.0 two NaN 6.0
We can also fill missing values using the fill_value parameter.
>>> table=pd.pivot_table(df,values='D',index=['A','B'],... columns=['C'],aggfunc="sum",fill_value=0)>>> tableC large smallA Bbar one 4 5 two 7 6foo one 4 1 two 0 6
The next example aggregates by taking the mean across multiple columns.
>>> table=pd.pivot_table(df,values=['D','E'],index=['A','C'],... aggfunc={'D':"mean",'E':"mean"})>>> table D EA Cbar large 5.500000 7.500000 small 5.500000 8.500000foo large 2.000000 4.500000 small 2.333333 4.333333
We can also calculate multiple types of aggregations for any given
value column.
>>> table=pd.pivot_table(df,values=['D','E'],index=['A','C'],... aggfunc={'D':"mean",... 'E':["min","max","mean"]})>>> table D E mean max mean minA Cbar large 5.500000 9 7.500000 6 small 5.500000 9 8.500000 8foo large 2.000000 5 4.500000 4 small 2.333333 6 4.333333 2
Create polygons formed from the linework of a GeoSeries.
Polygonizes the GeoSeries that contain linework which represents the
edges of a planar graph. Any geometry type may be provided as input; only the
constituent lines and rings will be used to create the output polygons.
Lines or rings that when combined do not completely close a polygon will be
ignored. Duplicate segments are ignored.
Unless you know that the input GeoSeries represents a planar graph with a clean
topology (e.g. there is a node on both lines where they intersect), it is
recommended to use node=True which performs noding prior to polygonization.
Using node=False will provide performance benefits but may result in
incorrect polygons if the input is not of the proper topology.
When full=True, the return value is a 4-tuple containing output polygons,
along with lines which could not be converted to polygons. The return value
consists of 4 elements or varying lenghts:
GeoSeries of the valid polygons (same as with full=False)
GeoSeries of cut edges: edges connected on both ends but not part of
polygonal output
GeoSeries of dangles: edges connected on one end but not part of polygonal
output
GeoSeries of invalid rings: polygons that are formed but are not valid
(bowties, etc)
Parameters:
node (bool, defaultTrue) – Perform noding prior to polygonization, by default True.
full (bool, defaultFalse) – Return the full output composed of a tuple of GeoSeries, by default False.
Returns:
GeoSeries with the polygons or a tuple of four GeoSeries as
(polygons,cuts,dangles,invalid)
Get Exponential power of dataframe and other, element-wise (binary operator pow).
Equivalent to dataframe**other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rpow.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Return the product of the values over the requested axis.
Parameters:
axis ({index(0),columns(1)}) –
Axis for the function to be applied on.
For Series this parameter is unused and defaults to 0.
Warning
The behavior of DataFrame.prod with axis=None is deprecated,
in a future version this will reduce over both axes and return a scalar
To retain the old behavior, pass axis=0 (or do not pass axis).
Added in version 2.0.0.
skipna (bool, defaultTrue) – Exclude NA/null values when computing the result.
numeric_only (bool, defaultFalse) – Include only float, int, boolean columns. Not implemented for Series.
min_count (int, default0) – The required number of valid values to perform the operation. If fewer than
min_count non-NA values are present the result will be NA.
**kwargs – Additional keyword arguments to be passed to the function.
Return type:
Series or scalar
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
Examples
By default, the product of an empty or all-NA Series is 1
>>> pd.Series([],dtype="float64").prod()1.0
This can be controlled with the min_count parameter
Return the product of the values over the requested axis.
Parameters:
axis ({index(0),columns(1)}) –
Axis for the function to be applied on.
For Series this parameter is unused and defaults to 0.
Warning
The behavior of DataFrame.prod with axis=None is deprecated,
in a future version this will reduce over both axes and return a scalar
To retain the old behavior, pass axis=0 (or do not pass axis).
Added in version 2.0.0.
skipna (bool, defaultTrue) – Exclude NA/null values when computing the result.
numeric_only (bool, defaultFalse) – Include only float, int, boolean columns. Not implemented for Series.
min_count (int, default0) – The required number of valid values to perform the operation. If fewer than
min_count non-NA values are present the result will be NA.
**kwargs – Additional keyword arguments to be passed to the function.
Return type:
Series or scalar
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
Examples
By default, the product of an empty or all-NA Series is 1
>>> pd.Series([],dtype="float64").prod()1.0
This can be controlled with the min_count parameter
Return the distance along each geometry nearest to other.
The operation works on a 1-to-1 row-wise manner:
The project method is the inverse of interpolate.
In shapely, this is equal to line_locate_point.
Parameters:
other (BaseGeometry or GeoSeries) – The other geometry to computed projected point from.
normalized (boolean) – If normalized is True, return the distance normalized to
the length of the object.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and project elements with the same index using
align=True or ignore index and project elements based on their matching
order using align=False:
This optional parameter specifies the interpolation method to use,
when the desired quantile lies between two data points i and j:
linear: i + (j - i) * fraction, where fraction is the
fractional part of the index surrounded by i and j.
lower: i.
higher: j.
nearest: i or j whichever is nearest.
midpoint: (i + j) / 2.
method ({'single','table'}, default'single') – Whether to compute quantiles per-column (‘single’) or over all columns
(‘table’). When ‘table’, the only allowed interpolation methods are
‘nearest’, ‘lower’, and ‘higher’.
Returns:
If q is an array, a DataFrame will be returned where the
index is q, the columns are the columns of self, and the
values are the quantiles.
If q is a float, a Series will be returned where the
index is the columns of self and the values are the quantiles.
You can refer to variables
in the environment by prefixing them with an ‘@’ character like
@a+b.
You can refer to column names that are not valid Python variable names
by surrounding them in backticks. Thus, column names containing spaces
or punctuations (besides underscores) or starting with digits must be
surrounded by backticks. (For example, a column named “Area (cm^2)” would
be referenced as `Area(cm^2)`). Column names which are Python keywords
(like “list”, “for”, “import”, etc) cannot be used.
For example, if one of your columns is called aa and you want
to sum it with b, your query should be `aa`+b.
inplace (bool) – Whether to modify the DataFrame rather than creating a new one.
**kwargs – See the documentation for eval() for complete details
on the keyword arguments accepted by DataFrame.query().
Returns:
DataFrame resulting from the provided query expression or
None if inplace=True.
Evaluate a string describing operations on DataFrame columns.
DataFrame.eval
Evaluate a string describing operations on DataFrame columns.
Notes
The result of the evaluation of this expression is first passed to
DataFrame.loc and if that fails because of a
multidimensional key (e.g., a DataFrame) then the result will be passed
to DataFrame.__getitem__().
This method uses the top-level eval() function to
evaluate the passed query.
The query() method uses a slightly
modified Python syntax by default. For example, the & and |
(bitwise) operators have the precedence of their boolean cousins,
and and or. This is syntactically valid Python,
however the semantics are different.
You can change the semantics of the expression by passing the keyword
argument parser='python'. This enforces the same semantics as
evaluation in Python space. Likewise, you can pass engine='python'
to evaluate an expression using Python itself as a backend. This is not
recommended as it is inefficient compared to using numexpr as the
engine.
The DataFrame.index and
DataFrame.columns attributes of the
DataFrame instance are placed in the query namespace
by default, which allows you to treat both the index and columns of the
frame as a column in the frame.
The identifier index is used for the frame index; you can also
use the name of the index to identify it in a query. Please note that
Python keywords may not be used as identifiers.
For further details and examples see the query documentation in
indexing.
Backtick quoted variables
Backtick quoted variables are parsed as literal Python code and
are converted internally to a Python valid identifier.
This can lead to the following problems.
During parsing a number of disallowed characters inside the backtick
quoted string are replaced by strings that are allowed as a Python identifier.
These characters include all operators in Python, the space character, the
question mark, the exclamation mark, the dollar sign, and the euro sign.
For other characters that fall outside the ASCII range (U+0001..U+007F)
and those that are not further specified in PEP 3131,
the query parser will raise an error.
This excludes whitespace different than the space character,
but also the hashtag (as it is used for comments) and the backtick
itself (backtick can also not be escaped).
In a special case, quotes that make a pair around a backtick can
confuse the parser.
For example, `it's`>`that's` will raise an error,
as it forms a quoted string ('s>`that') with a backtick inside.
Get Addition of dataframe and other, element-wise (binary operator radd).
Equivalent to other+dataframe, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, add.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
The following example shows how the method behaves with the above
parameters:
default_rank: this is the default behaviour obtained without using
any parameter.
max_rank: setting method='max' the records that have the
same values are ranked using the highest rank (e.g.: since ‘cat’
and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)
NA_bottom: choosing na_option='bottom', if there are records
with NaN values they are placed at the bottom of the ranking.
pct_rank: when setting pct=True, the ranking is expressed as
percentile rank.
>>> df['default_rank']=df['Number_legs'].rank()>>> df['max_rank']=df['Number_legs'].rank(method='max')>>> df['NA_bottom']=df['Number_legs'].rank(na_option='bottom')>>> df['pct_rank']=df['Number_legs'].rank(pct=True)>>> df Animal Number_legs default_rank max_rank NA_bottom pct_rank0 cat 4.0 2.5 3.0 2.5 0.6251 penguin 2.0 1.0 1.0 1.0 0.2502 dog 4.0 2.5 3.0 2.5 0.6253 spider 8.0 4.0 4.0 4.0 1.0004 snake NaN NaN NaN 5.0 NaN
Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
Equivalent to other/dataframe, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, truediv.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Conform DataFrame to new index with optional filling logic.
Places NA/NaN in locations having no value in the previous index. A new object
is produced unless the new index is equivalent to the current one and
copy=False.
Parameters:
labels (array-like, optional) – New labels / index to conform the axis specified by ‘axis’ to.
index (array-like, optional) – New labels for the index. Preferably an Index object to avoid
duplicating data.
columns (array-like, optional) – New labels for the columns. Preferably an Index object to avoid
duplicating data.
axis (int or str, optional) – Axis to target. Can be either the axis name (‘index’, ‘columns’)
or number (0, 1).
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
None (default): don’t fill gaps
pad / ffill: Propagate last valid observation forward to next
valid.
backfill / bfill: Use next valid observation to fill gap.
nearest: Use nearest valid observations to fill gap.
Return a new object, even if the passed indexes are the same.
Note
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
level (int or name) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (scalar, defaultnp.nan) – Value to use for missing values. Defaults to NaN, but can be any
“compatible” value.
limit (int, defaultNone) – Maximum number of consecutive elements to forward or backward fill.
tolerance (optional) –
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations most
satisfy the equation abs(index[indexer]-target)<=tolerance.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index’s type.
Return type:
DataFramewithchangedindex.
See also
DataFrame.set_index
Set row labels.
DataFrame.reset_index
Remove row labels or move them to new columns.
DataFrame.reindex_like
Change to same indices as other DataFrame.
Examples
DataFrame.reindex supports two calling conventions
(index=index_labels,columns=column_labels,...)
(labels,axis={'index','columns'},...)
We highly recommend using keyword arguments to clarify your
intent.
Create a new index and reindex the dataframe. By default
values in the new index that do not have corresponding
records in the dataframe are assigned NaN.
>>> new_index=['Safari','Iceweasel','Comodo Dragon','IE10',... 'Chrome']>>> df.reindex(new_index) http_status response_timeSafari 404.0 0.07Iceweasel NaN NaNComodo Dragon NaN NaNIE10 404.0 0.08Chrome 200.0 0.02
We can fill in the missing values by passing a value to
the keyword fill_value. Because the index is not monotonically
increasing or decreasing, we cannot use arguments to the keyword
method to fill the NaN values.
To further illustrate the filling functionality in
reindex, we will create a dataframe with a
monotonically increasing index (for example, a sequence
of dates).
The index entries that did not have a value in the original data frame
(for example, ‘2009-12-29’) are by default filled with NaN.
If desired, we can fill in the missing values using one of several
options.
For example, to back-propagate the last valid value to fill the NaN
values, pass bfill as an argument to the method keyword.
Please note that the NaN value present in the original dataframe
(at index value 2010-01-03) will not be filled by any of the
value propagation schemes. This is because filling while reindexing
does not look at dataframe values, but only compares the original and
desired indexes. If you do want to fill in the NaN values present
in the original dataframe, use the fillna() method.
Return an object with matching indices as other object.
Conform the object to the same index on all axes. Optional
filling logic, placing NaN in locations having no value
in the previous index. A new object is produced unless the
new index is equivalent to the current one and copy=False.
Parameters:
other (Object of thesamedatatype) – Its row and column indices are used to define the new indices
of this object.
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
None (default): don’t fill gaps
pad / ffill: propagate last valid observation forward to next
valid
backfill / bfill: use next valid observation to fill gap
nearest: use nearest valid observations to fill gap.
Return a new object, even if the passed indexes are the same.
Note
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
limit (int, defaultNone) – Maximum number of consecutive labels to fill for inexact matches.
tolerance (optional) –
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation abs(index[indexer]-target)<=tolerance.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index’s type.
Returns:
Same type as caller, but with changed indices on each axis.
Return type:
Series or DataFrame
See also
DataFrame.set_index
Set row labels.
DataFrame.reset_index
Remove row labels or move them to new columns.
DataFrame.reindex
Change to new indices or expand indices.
Notes
Same as calling
.reindex(index=other.index,columns=other.columns,...).
>>> df2 temp_celsius windspeed2014-02-12 28.0 low2014-02-13 30.0 low2014-02-15 35.1 medium
>>> df2.reindex_like(df1) temp_celsius temp_fahrenheit windspeed2014-02-12 28.0 NaN low2014-02-13 30.0 NaN low2014-02-14 NaN NaN NaN2014-02-15 35.1 NaN medium
Return the DE-9IM intersection matrices for the geometries.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (BaseGeometry or GeoSeries) – The other geometry to computed
the DE-9IM intersection matrices from.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
Returns:
spatial_relations – The DE-9IM intersection matrices which describe
the spatial relations of the other geometry.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Return True if the DE-9IM string code for the relationship between
the geometries satisfies the pattern, else False.
This function compares the DE-9IM code string for two geometries
against a specified pattern. If the string matches the pattern then
True is returned, otherwise False. The pattern specified can
be an exact match (0, 1 or 2), a boolean match
(uppercase T or F), or a wildcard (*). For example,
the pattern for the within predicate is 'T*F**F***'
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (BaseGeometry or GeoSeries) – The other geometry to be tested agains the pattern.
pattern (str) – The DE-9IM pattern to test against.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
mapper (dict-like or function) – Dict-like or function transformations to apply to
that axis’ values. Use either mapper and axis to
specify the axis to target with mapper, or index and
columns.
index (dict-like or function) – Alternative to specifying axis (mapper,axis=0
is equivalent to index=mapper).
columns (dict-like or function) – Alternative to specifying axis (mapper,axis=1
is equivalent to columns=mapper).
axis ({0or'index',1or'columns'}, default0) – Axis to target with mapper. Can be either the axis name
(‘index’, ‘columns’) or number (0, 1). The default is ‘index’.
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
inplace (bool, defaultFalse) – Whether to modify the DataFrame rather than creating a new one.
If True then value of copy is ignored.
level (int or levelname, defaultNone) – In case of a MultiIndex, only rename labels in the specified
level.
errors ({'ignore','raise'}, default'ignore') – If ‘raise’, raise a KeyError when a dict-like mapper, index,
or columns contains labels that are not present in the Index
being transformed.
If ‘ignore’, existing keys will be renamed and extra keys will be
ignored.
Returns:
DataFrame with the renamed axis labels or None if inplace=True.
Set the name of the axis for the index or columns.
Parameters:
mapper (scalar, list-like, optional) – Value to set the axis name attribute.
index (scalar, list-like, dict-like or function, optional) –
A scalar, list-like, dict-like or functions transformations to
apply to that axis’ values.
Note that the columns parameter is not allowed if the
object is a Series. This parameter only apply for DataFrame
type objects.
Use either mapper and axis to
specify the axis to target with mapper, or index
and/or columns.
columns (scalar, list-like, dict-like or function, optional) –
A scalar, list-like, dict-like or functions transformations to
apply to that axis’ values.
Note that the columns parameter is not allowed if the
object is a Series. This parameter only apply for DataFrame
type objects.
Use either mapper and axis to
specify the axis to target with mapper, or index
and/or columns.
axis ({0or'index',1or'columns'}, default0) – The axis to rename. For Series this parameter is unused and defaults to 0.
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
inplace (bool, defaultFalse) – Modifies the object directly, instead of creating a new Series
or DataFrame.
Returns:
The same type as the caller or None if inplace=True.
DataFrame.rename_axis supports two calling conventions
(index=index_mapper,columns=columns_mapper,...)
(mapper,axis={'index','columns'},...)
The first calling convention will only modify the names of
the index and/or the names of the Index object that is the columns.
In this case, the parameter copy is ignored.
The second calling convention will modify the names of the
corresponding index if mapper is a list or a scalar.
However, if mapper is dict-like or a function, it will use the
deprecated behavior of modifying the axis labels.
We highly recommend using keyword arguments to clarify your
intent.
Values of the Series/DataFrame are replaced with other values dynamically.
This differs from updating with .loc or .iloc, which require
you to specify a location to update with some value.
numeric: numeric values equal to to_replace will be
replaced with value
str: string exactly matching to_replace will be replaced
with value
regex: regexs matching to_replace will be replaced with
value
list of str, regex, or numeric:
First, if to_replace and value are both lists, they
must be the same length.
Second, if regex=True then all of the strings in both
lists will be interpreted as regexs otherwise they will match
directly. This doesn’t matter much for value since there
are only a few possible substitution regexes you can use.
str, regex and numeric rules apply as above.
dict:
Dicts can be used to specify different replacement values
for different existing values. For example,
{'a':'b','y':'z'} replaces the value ‘a’ with ‘b’ and
‘y’ with ‘z’. To use a dict in this way, the optional value
parameter should not be given.
For a DataFrame a dict can specify that different values
should be replaced in different columns. For example,
{'a':1,'b':'z'} looks for the value 1 in column ‘a’
and the value ‘z’ in column ‘b’ and replaces these values
with whatever is specified in value. The value parameter
should not be None in this case. You can treat this as a
special case of passing two lists except that you are
specifying the column to search in.
For a DataFrame nested dictionaries, e.g.,
{'a':{'b':np.nan}}, are read as follows: look in column
‘a’ for the value ‘b’ and replace it with NaN. The optional value
parameter should not be specified to use a nested dict in this
way. You can nest regular expressions as well. Note that
column names (the top-level dictionary keys in a nested
dictionary) cannot be regular expressions.
None:
This means that the regex argument must be a string,
compiled regular expression, or list, dict, ndarray or
Series of such elements. If value is also None then
this must be a nested dictionary or Series.
See the examples section for examples of each of these.
value (scalar, dict, list, str, regex, defaultNone) – Value to replace any values matching to_replace with.
For a DataFrame a dict of values can be used to specify which
value to use for each column (columns not in the dict will not be
filled). Regular expressions, strings and lists or dicts of such
objects are also allowed.
inplace (bool, defaultFalse) – If True, performs operation inplace and returns None.
regex (bool or sametypesas`to_replace`, defaultFalse) – Whether to interpret to_replace and/or value as regular
expressions. Alternatively, this could be a regular expression or a
list, dict, or array of regular expressions in which case
to_replace must be None.
method ({'pad','ffill','bfill'}) –
The method to use when for replacement, when to_replace is a
scalar, list or tuple and value is None.
If to_replace is not a scalar, array-like, dict, or None
* If to_replace is a dict and value is not a list,
dict, ndarray, or Series
* If to_replace is None and regex is not compilable
into a regular expression or is a list, dict, ndarray, or
Series.
* When replacing multiple bool or datetime64 objects and
the arguments to to_replace does not match the type of the
value being replaced
If a list or an ndarray is passed to to_replace and
value but they are not the same length.
See also
Series.fillna
Fill NA values.
DataFrame.fillna
Fill NA values.
Series.where
Replace values based on boolean condition.
DataFrame.where
Replace values based on boolean condition.
DataFrame.map
Apply a function to a Dataframe elementwise.
Series.map
Map values of Series according to an input mapping or function.
Series.str.replace
Simple string replacement.
Notes
Regex substitution is performed under the hood with re.sub. The
rules for substitution for re.sub are the same.
Regular expressions will only substitute on strings, meaning you
cannot provide, for example, a regular expression matching floating
point numbers and expect the columns in your frame that have a
numeric dtype to be matched. However, if those floating point
numbers are strings, then you can do this.
This method has a lot of options. You are encouraged to experiment
and play with this method to gain intuition about how it works.
When dict is used as the to_replace value, it is like
key(s) in the dict are the to_replace part and
value(s) in the dict are the value parameter.
>>> df.replace({0:10,1:100}) A B C0 10 5 a1 100 6 b2 2 7 c3 3 8 d4 4 9 e
>>> df.replace({'A':0,'B':5},100) A B C0 100 100 a1 1 6 b2 2 7 c3 3 8 d4 4 9 e
>>> df.replace({'A':{0:100,4:400}}) A B C0 100 5 a1 1 6 b2 2 7 c3 3 8 d4 400 9 e
Regular expression `to_replace`
>>> df=pd.DataFrame({'A':['bat','foo','bait'],... 'B':['abc','bar','xyz']})>>> df.replace(to_replace=r'^ba.$',value='new',regex=True) A B0 new abc1 foo new2 bait xyz
>>> df.replace({'A':r'^ba.$'},{'A':'new'},regex=True) A B0 new abc1 foo bar2 bait xyz
>>> df.replace(regex=r'^ba.$',value='new') A B0 new abc1 foo new2 bait xyz
>>> df.replace(regex={r'^ba.$':'new','foo':'xyz'}) A B0 new abc1 xyz new2 bait xyz
>>> df.replace(regex=[r'^ba.$','foo'],value='new') A B0 new abc1 new new2 bait xyz
Compare the behavior of s.replace({'a':None}) and
s.replace('a',None) to understand the peculiarities
of the to_replace parameter:
>>> s=pd.Series([10,'a','a','b','a'])
When one uses a dict as the to_replace value, it is like the
value(s) in the dict are equal to the value parameter.
s.replace({'a':None}) is equivalent to
s.replace(to_replace={'a':None},value=None,method=None):
When value is not explicitly passed and to_replace is a scalar, list
or tuple, replace uses the method parameter (default ‘pad’) to do the
replacement. So this is why the ‘a’ values are being replaced by 10
in rows 1 and 2 and ‘b’ in row 4 in this case.
>>> s.replace('a')0 101 102 103 b4 bdtype: object
Deprecated since version 2.1.0: The ‘method’ parameter and padding behavior are deprecated.
On the other hand, if None is explicitly passed for value, it will
be respected:
Convenience method for frequency conversion and resampling of time series.
The object must have a datetime-like index (DatetimeIndex, PeriodIndex,
or TimedeltaIndex), or the caller must pass the label of a datetime-like
series/index to the on/level keyword parameter.
Parameters:
rule (DateOffset, Timedelta or str) – The offset string or object representing target conversion.
axis ({0or'index',1or'columns'}, default0) –
Which axis to use for up- or down-sampling. For Series this parameter
is unused and defaults to 0. Must be
DatetimeIndex, TimedeltaIndex or PeriodIndex.
Deprecated since version 2.0.0: Use frame.T.resample(…) instead.
closed ({'right','left'}, defaultNone) – Which side of bin interval is closed. The default is ‘left’
for all frequency offsets except for ‘ME’, ‘YE’, ‘QE’, ‘BME’,
‘BA’, ‘BQE’, and ‘W’ which all have a default of ‘right’.
label ({'right','left'}, defaultNone) – Which bin edge label to label bucket with. The default is ‘left’
for all frequency offsets except for ‘ME’, ‘YE’, ‘QE’, ‘BME’,
‘BA’, ‘BQE’, and ‘W’ which all have a default of ‘right’.
Pass ‘timestamp’ to convert the resulting index to a
DateTimeIndex or ‘period’ to convert it to a PeriodIndex.
By default the input representation is retained.
Deprecated since version 2.2.0: Convert index to desired type explicitly instead.
on (str, optional) – For a DataFrame, column to use instead of index for resampling.
Column must be datetime-like.
level (str or int, optional) – For a MultiIndex, level (name or number) to use for
resampling. level must be datetime-like.
Whether to include the group keys in the result index when using
.apply() on the resampled object.
Added in version 1.5.0: Not specifying group_keys will retain values-dependent behavior
from pandas 1.4 and earlier (see pandas 1.5.0 Release notes for examples).
Changed in version 2.0.0: group_keys now defaults to False.
Downsample the series into 3 minute bins as above, but label each
bin using the right edge instead of the left. Please note that the
value in the bucket used as the label is not included in the bucket,
which it labels. For example, in the original series the
bucket 2000-01-0100:03:00 contains the value 3, but the summed
value in the resampled bucket with the label 2000-01-0100:03:00
does not include 3 (if it did, the summed value would be 6, not 3).
In contrast with the start_day, you can use end_day to take the ceiling
midnight of the largest Timestamp as the end of the bins and drop the bins
not containing data:
Reset the index of the DataFrame, and use the default one instead.
If the DataFrame has a MultiIndex, this method can remove one or more
levels.
Parameters:
level (int, str, tuple, or list, defaultNone) – Only remove the given levels from the index. Removes all levels by
default.
drop (bool, defaultFalse) – Do not try to insert index into dataframe columns. This resets
the index to the default integer index.
inplace (bool, defaultFalse) – Whether to modify the DataFrame rather than creating a new one.
col_level (int or str, default0) – If the columns have multiple levels, determines which level the
labels are inserted into. By default it is inserted into the first
level.
col_fill (object, default'') – If the columns have multiple levels, determines how the other
levels are named. If None then the index name is repeated.
names (int, str or 1-dimensionallist, defaultNone) –
Using the given string, rename the DataFrame column which contains the
index data. If the DataFrame has a MultiIndex, this has to be a list or
tuple with length equal to the number of levels.
Added in version 1.5.0.
Returns:
DataFrame with the new index or None if inplace=True.
>>> df=pd.DataFrame([('bird',389.0),... ('bird',24.0),... ('mammal',80.5),... ('mammal',np.nan)],... index=['falcon','parrot','lion','monkey'],... columns=('class','max_speed'))>>> df class max_speedfalcon bird 389.0parrot bird 24.0lion mammal 80.5monkey mammal NaN
When we reset the index, the old index is added as a column, and a
new sequential index is used:
>>> df.reset_index() index class max_speed0 falcon bird 389.01 parrot bird 24.02 lion mammal 80.53 monkey mammal NaN
We can use the drop parameter to avoid the old index being added as
a column:
>>> df.reset_index(drop=True) class max_speed0 bird 389.01 bird 24.02 mammal 80.53 mammal NaN
You can also use reset_index with MultiIndex.
>>> index=pd.MultiIndex.from_tuples([('bird','falcon'),... ('bird','parrot'),... ('mammal','lion'),... ('mammal','monkey')],... names=['class','name'])>>> columns=pd.MultiIndex.from_tuples([('speed','max'),... ('species','type')])>>> df=pd.DataFrame([(389.0,'fly'),... (24.0,'fly'),... (80.5,'run'),... (np.nan,'jump')],... index=index,... columns=columns)>>> df speed species max typeclass namebird falcon 389.0 fly parrot 24.0 flymammal lion 80.5 run monkey NaN jump
Using the names parameter, choose a name for the index column:
>>> df.reset_index(names=['classes','names']) classes names speed species max type0 bird falcon 389.0 fly1 bird parrot 24.0 fly2 mammal lion 80.5 run3 mammal monkey NaN jump
If the index has multiple levels, we can reset a subset of them:
>>> df.reset_index(level='class') class speed species max typenamefalcon bird 389.0 flyparrot bird 24.0 flylion mammal 80.5 runmonkey mammal NaN jump
If we are not dropping the index, by default, it is placed in the top
level. We can place it in another level:
>>> df.reset_index(level='class',col_level=1) speed species class max typenamefalcon bird 389.0 flyparrot bird 24.0 flylion mammal 80.5 runmonkey mammal NaN jump
When the index is inserted under another level, we can specify under
which one with the parameter col_fill:
>>> df.reset_index(level='class',col_level=1,col_fill='species') species speed species class max typenamefalcon bird 389.0 flyparrot bird 24.0 flylion mammal 80.5 runmonkey mammal NaN jump
If we specify a nonexistent level for col_fill, it is created:
>>> df.reset_index(level='class',col_level=1,col_fill='genus') genus speed species class max typenamefalcon bird 389.0 flyparrot bird 24.0 flylion mammal 80.5 runmonkey mammal NaN jump
Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).
Equivalent to other//dataframe, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, floordiv.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Get Modulo of dataframe and other, element-wise (binary operator rmod).
Equivalent to other%dataframe, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, mod.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Get Multiplication of dataframe and other, element-wise (binary operator rmul).
Equivalent to other*dataframe, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, mul.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
window (int, timedelta, str, offset, or BaseIndexersubclass) –
Size of the moving window.
If an integer, the fixed number of observations used for
each window.
If a timedelta, str, or offset, the time period of each window. Each
window will be a variable sized based on the observations included in
the time-period. This is only valid for datetimelike indexes.
To learn more about the offsets & frequency strings, please see this link.
If a BaseIndexer subclass, the window boundaries
based on the defined get_window_bounds method. Additional rolling
keyword arguments, namely min_periods, center, closed and
step will be passed to get_window_bounds.
Certain Scipy window types require additional parameters to be passed
in the aggregation function. The additional parameters must match
the keywords specified in the Scipy window type method signature.
Evaluate the window at every step result, equivalent to slicing as
[::step]. window must be an integer. Using a step argument other
than None or 1 will produce a result with a different shape than the input.
angle (float) – The angle of rotation can be specified in either degrees (default)
or radians by setting use_radians=True. Positive angles are
counter-clockwise and negative are clockwise rotations.
origin (string, Point, or tuple(x, y)) – The point of origin can be a keyword ‘center’ for the bounding box
center (default), ‘centroid’ for the geometry’s centroid, a Point
object or a coordinate tuple (x, y).
use_radians (boolean) – Whether to interpret the angle of rotation as degrees or radians
Round a DataFrame to a variable number of decimal places.
Parameters:
decimals (int, dict, Series) – Number of decimal places to round each column to. If an int is
given, round each column to the same number of places.
Otherwise dict and Series round to variable numbers of places.
Column names should be in the keys if decimals is a
dict-like, or in the index if decimals is a Series. Any
columns not included in decimals will be left as is. Elements
of decimals which are not columns of the input will be
ignored.
*args – Additional keywords have no effect but might be accepted for
compatibility with numpy.
**kwargs – Additional keywords have no effect but might be accepted for
compatibility with numpy.
Returns:
A DataFrame with the affected columns rounded to the specified
number of decimal places.
Get Exponential power of dataframe and other, element-wise (binary operator rpow).
Equivalent to other**dataframe, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, pow.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Get Subtraction of dataframe and other, element-wise (binary operator rsub).
Equivalent to other-dataframe, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, sub.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
Equivalent to other/dataframe, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, truediv.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
Parameters:
n (int, optional) – Number of items from axis to return. Cannot be used with frac.
Default = 1 if frac = None.
frac (float, optional) – Fraction of axis items to return. Cannot be used with n.
replace (bool, defaultFalse) – Allow or disallow sampling of the same row more than once.
weights (str or ndarray-like, optional) – Default ‘None’ results in equal probability weighting.
If passed a Series, will align with target object on index. Index
values in weights not found in sampled object will be ignored and
index values in sampled object not in weights will be assigned
weights of zero.
If called on a DataFrame, will accept the name of a column
when axis = 0.
Unless weights are a Series, weights must be same length as axis
being sampled.
If weights do not sum to 1, they will be normalized to sum to 1.
Missing values in the weights column will be treated as zero.
Infinite values not allowed.
If int, array-like, or BitGenerator, seed for random number generator.
If np.random.RandomState or np.random.Generator, use as given.
Changed in version 1.4.0: np.random.Generator objects now accepted
axis ({0or'index',1or'columns',None}, defaultNone) – Axis to sample. Accepts axis number or name. Default is stat axis
for given data type. For Series this parameter is unused and defaults to None.
Generate a MultiPoint per each geometry containing points sampled from the
geometry. You can either sample randomly from a uniform distribution or use an
advanced sampling algorithm from the pointpats package.
For polygons, this samples within the area of the polygon. For lines,
this samples along the length of the linestring. For multi-part
geometries, the weights of each part are selected according to their relevant
attribute (area for Polygons, length for LineStrings), and then points are
sampled from each part.
Any other geometry type (e.g. Point, GeometryCollection) is ignored, and an
empty MultiPoint geometry is returned.
Parameters:
size (int|array-like) – The size of the sample requested. Indicates the number of samples to draw
from each geometry. If an array of the same length as a GeoSeries is
passed, it denotes the size of a sample per geometry.
method (str, default"uniform") – The sampling method. uniform samples uniformly at random from a
geometry using numpy.random.uniform. Other allowed strings
(e.g. "cluster_poisson") denote sampling function name from the
pointpats.random module (see
http://pysal.org/pointpats/api.html#random-distributions). Pointpats methods
are implemented for (Multi)Polygons only and will return an empty MultiPoint
for other geometry types.
rng ({None,int,array_like[ints],SeedSequence,BitGenerator,Generator}, optional) – A random generator or seed to initialize the numpy BitGenerator. If None, then fresh,
unpredictable entropy will be pulled from the OS.
**kwargs (dict) – Options for the pointpats sampling algorithms.
xfact (float, float, float) – Scaling factors for the x, y, and z dimensions respectively.
yfact (float, float, float) – Scaling factors for the x, y, and z dimensions respectively.
zfact (float, float, float) – Scaling factors for the x, y, and z dimensions respectively.
origin (string, Point, or tuple) – The point of origin can be a keyword ‘center’ for the 2D bounding
box center (default), ‘centroid’ for the geometry’s 2D centroid, a
Point object or a coordinate tuple (x, y, z).
Return a GeoSeries with vertices added to line segments based on
maximum segment length.
Additional vertices will be added to every line segment in an input geometry so
that segments are no longer than the provided maximum segment length. New
vertices will evenly subdivide each segment. Only linear components of input
geometries are densified; other geometries are returned unmodified.
Parameters:
max_segment_length (float|array-like) – Additional vertices will be added so that all line segments are no longer
than this value. Must be greater than 0.
If both of include and exclude are empty
* If include and exclude have overlapping elements
* If any kind of string dtype is passed in.
See also
DataFrame.dtypes
Return Series with the data type of each column.
Notes
To select all numeric types, use np.number or 'number'
To select strings you must use the object dtype, but note that
this will return all object dtype columns. With
pd.options.future.infer_string enabled, using "str" will
work to select all string columns.
Return unbiased standard error of the mean over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
Parameters:
axis ({index(0),columns(1)}) –
For Series this parameter is unused and defaults to 0.
Warning
The behavior of DataFrame.sem with axis=None is deprecated,
in a future version this will reduce over both axes and return a scalar
To retain the old behavior, pass axis=0 (or do not pass axis).
skipna (bool, defaultTrue) – Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
ddof (int, default1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
where N represents the number of elements.
numeric_only (bool, defaultFalse) – Include only float, int, boolean columns. Not implemented for Series.
Indexes for column or row labels can be changed by assigning
a list-like or Index.
Parameters:
labels (list-like, Index) – The values for the new index.
axis ({0or'index',1or'columns'}, default0) – The axis to update. The value 0 identifies the rows. For Series
this parameter is unused and defaults to 0.
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
Returns:
An object of type DataFrame.
Return type:
DataFrame
See also
DataFrame.rename_axis
Alter the name of the index or columns. Examples ——– >>> df = pd.DataFrame({“A”: [1, 2, 3], “B”: [4, 5, 6]}) Change the row labels. >>> df.set_axis([‘a’, ‘b’, ‘c’], axis=’index’) A B a 1 4 b 2 5 c 3 6 Change the column labels. >>> df.set_axis([‘I’, ‘II’], axis=’columns’) I II 0 1 4 1 2 5 2 3 6
Set the Coordinate Reference System (CRS) of the GeoDataFrame.
If there are multiple geometry columns within the GeoDataFrame, only
the CRS of the active geometry column is set.
Pass None to remove CRS from the active geometry column.
Notes
The underlying geometries are not transformed to this CRS. To
transform the geometries to a new CRS, use the to_crs method.
Parameters:
crs (pyproj.CRS|None, optional) – The value can be anything accepted
by pyproj.CRS.from_user_input(),
such as an authority string (eg “EPSG:4326”) or a WKT string.
epsg (int, optional) – EPSG code specifying the projection.
inplace (bool, defaultFalse) – If True, the CRS of the GeoDataFrame will be changed in place
(while still returning the result) instead of making a copy of
the GeoDataFrame.
allow_override (bool, defaultFalse) – If the the GeoDataFrame already has a CRS, allow to replace the
existing CRS, even when both are not equal.
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
allows_duplicate_labels (bool, optional) – Whether the returned object allows duplicate labels.
Returns:
The same type as the caller.
Return type:
Series or DataFrame
See also
DataFrame.attrs
Global metadata applying to this dataset.
DataFrame.flags
Global flags applying to this object.
Notes
This method returns a new object that’s a view on the same data
as the input. Mutating the input or the output values will be reflected
in the other.
This method is intended to be used in method chains.
“Flags” differ from “metadata”. Flags reflect properties of the
pandas object (the Series or DataFrame). Metadata refer to properties
of the dataset, and should be stored in DataFrame.attrs.
Set the GeoDataFrame geometry using either an existing column or
the specified input. By default yields a new object.
The original geometry column is replaced with the input.
Parameters:
col (columnlabel or array-like) – An existing column name or values to set as the new geometry column.
If values (array-like, (Geo)Series) are passed, then if they are named
(Series) the new geometry column will have the corresponding name,
otherwise the existing geometry column will be replaced. If there is
no existing geometry column, the new geometry column will use the
default name “geometry”.
When specifying a named Series or an existing column name for col,
controls if the previous geometry column should be dropped from the
result. The default of False keeps both the old and new geometry column.
Deprecated since version 1.0.0.
inplace (boolean, defaultFalse) – Modify the GeoDataFrame in place (do not create a new object)
crs (pyproj.CRS, optional) – Coordinate system to use. The value can be anything accepted
by pyproj.CRS.from_user_input(),
such as an authority string (eg “EPSG:4326”) or a WKT string.
If passed, overrides both DataFrame and col’s crs.
Otherwise, tries to get crs from passed col values or DataFrame.
Set the DataFrame index (row labels) using one or more existing
columns or arrays (of the correct length). The index can replace the
existing index or expand on it.
Parameters:
keys (label or array-like or list of labels/arrays) – This parameter can be either a single column key, a single array of
the same length as the calling DataFrame, or a list containing an
arbitrary combination of column keys and arrays. Here, “array”
encompasses Series, Index, np.ndarray, and
instances of Iterator.
drop (bool, defaultTrue) – Delete columns to be used as the new index.
append (bool, defaultFalse) – Whether to append columns to existing index.
inplace (bool, defaultFalse) – Whether to modify the DataFrame rather than creating a new one.
verify_integrity (bool, defaultFalse) – Check the new index for duplicates. Otherwise defer the check until
necessary. Setting to False will improve the performance of this
method.
Return a GeoSeries with the precision set to a precision grid size.
By default, geometries use double precision coordinates (grid_size=0).
Coordinates will be rounded if a precision grid is less precise than the input
geometry. Duplicated vertices will be dropped from lines and polygons for grid
sizes greater than 0. Line and polygon geometries may collapse to empty
geometries if all vertices are closer together than grid_size. Spikes or
sections in Polygons narrower than grid_size after rounding the vertices
will be removed, which can lead to MultiPolygons or empty geometries. Z values,
if present, will not be modified.
Parameters:
grid_size (float) – Precision grid size. If 0, will use double precision (will not modify
geometry if precision grid size was not previously set). If this value is
more precise than input geometry, the input geometry will not be modified.
This parameter determines the way a precision reduction is applied on the
geometry. There are three modes:
'valid_output' (default): The output is always valid. Collapsed
geometry elements (including both polygons and lines) are removed.
Duplicate vertices are removed.
'pointwise': Precision reduction is performed pointwise. Output
geometry may be invalid due to collapse or self-intersection. Duplicate
vertices are not removed.
'keep_collapsed': Like the default mode, except that collapsed linear
geometry elements are preserved. Collapsed polygonal input elements are
removed. Duplicate vertices are removed.
>>> s.set_precision(1)0 POINT (1 1)1 POINT Z (1 1 0.9)2 LINESTRING (0 0, 0 1, 1 1)3 LINESTRING EMPTYdtype: geometry
>>> s.set_precision(1,mode="pointwise")0 POINT (1 1)1 POINT Z (1 1 0.9)2 LINESTRING (0 0, 0 0, 0 1, 1 1)3 LINESTRING (0 0, 0 0, 0 0)dtype: geometry
>>> s.set_precision(1,mode="keep_collapsed")0 POINT (1 1)1 POINT Z (1 1 0.9)2 LINESTRING (0 0, 0 1, 1 1)3 LINESTRING (0 0, 0 0)dtype: geometry
Notes
Subsequent operations will always be performed in the precision of the geometry
with higher precision (smaller grid_size). That same precision will be
attached to the operation outputs.
Input geometries should be geometrically valid; unexpected results may occur if
input geometries are not. You can check the validity with
is_valid() and fix invalid geometries with
make_valid() methods.
Geometries within the GeoSeries should be only (Multi)LineStrings or
LinearRings. A GeoSeries of GeometryCollections is returned with two elements
in each GeometryCollection. The first element is a MultiLineString containing
shared paths with the same direction for both inputs. The second element is a
MultiLineString containing shared paths with the opposite direction for the two
inputs.
You can extract individual geometries of the resulting GeometryCollection using
the GeoSeries.get_geometry() method.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (Geoseries or geometricobject) – The Geoseries (elementwise) or geometric object to find the shared paths
with. Has to contain only (Multi)LineString or LinearRing geometry types.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row. The GeoSeries
above have different indices than the one below. We can either align both
GeoSeries based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Shift index by desired number of periods with an optional time freq.
When freq is not passed, shift the index without realigning the data.
If freq is passed (in this case, the index must be date or datetime,
or it will raise a NotImplementedError), the index will be
increased using the periods and the freq. freq can be inferred
when specified as “infer” as long as either freq or inferred_freq
attribute is set in the index.
Parameters:
periods (int or Sequence) – Number of periods to shift. Can be positive or negative.
If an iterable of ints, the data will be shifted once by each int.
This is equivalent to shifting by one value at a time and
concatenating all resulting frames. The resulting columns will have
the shift suffixed to their column names. For multiple periods,
axis must not be 1.
freq (DateOffset, tseries.offsets, timedelta, or str, optional) – Offset to use from the tseries module or time rule (e.g. ‘EOM’).
If freq is specified then the index values are shifted but the
data is not realigned. That is, use freq if you would like to
extend the index when shifting and preserve the original data.
If freq is specified as “infer” then it will be inferred from
the freq or inferred_freq attributes of the index. If neither of
those attributes exist, a ValueError is thrown.
axis ({0or'index',1or'columns',None}, defaultNone) – Shift direction. For Series this parameter is unused and defaults to 0.
fill_value (object, optional) – The scalar value to use for newly introduced missing values.
the default depends on the dtype of self.
For numeric data, np.nan is used.
For datetime, timedelta, or period data, etc. NaT is used.
For extension dtypes, self.dtype.na_value is used.
suffix (str, optional) – If str and periods is an iterable, this is added after the column
name and before the shift value for each shifted column name.
>>> df.shift(periods=3) Col1 Col2 Col32020-01-01 NaN NaN NaN2020-01-02 NaN NaN NaN2020-01-03 NaN NaN NaN2020-01-04 10.0 13.0 17.02020-01-05 20.0 23.0 27.0
>>> df.shift(periods=1,axis="columns") Col1 Col2 Col32020-01-01 NaN 10 132020-01-02 NaN 20 232020-01-03 NaN 15 182020-01-04 NaN 30 332020-01-05 NaN 45 48
Return the shortest two-point line between two geometries.
The resulting line consists of two points, representing the nearest points
between the geometry pair. The line always starts in the first geometry a
and ends in he second geometry b. The endpoints of the line will not
necessarily be existing vertices of the input geometries a and b, but
can also be a point along a line segment.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (Geoseries or geometricobject) – The Geoseries (elementwise) or geometric object to find the
shortest line with.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices than the one below. We can either
align both GeoSeries based on index values and compare elements with the same
index using align=True or ignore index and compare elements based on their
matching order using align=False:
Return a GeoSeries containing a simplified representation of
each geometry.
The algorithm (Douglas-Peucker) recursively splits the original line
into smaller parts and connects these parts’ endpoints
by a straight line. Then, it removes all points whose distance
to the straight line is smaller than tolerance. It does not
move any points and it always preserves endpoints of
the original line or polygon.
See https://shapely.readthedocs.io/en/latest/manual.html#object.simplify
for details
Simplifies individual geometries independently, without considering
the topology of a potential polygonal coverage. If you would like to treat
the GeoSeries as a coverage and simplify its edges, while preserving the
coverage topology, see simplify_coverage().
Parameters:
tolerance (float) – All parts of a simplified geometry will be no more than
tolerance distance from the original. It has the same units
as the coordinate reference system of the GeoSeries.
For example, using tolerance=100 in a projected CRS with meters
as units means a distance of 100 meters in reality.
preserve_topology (bool(defaultTrue)) – False uses a quicker algorithm, but may produce self-intersecting
or otherwise invalid geometries.
Notes
Invalid geometric objects may result from simplification that does not
preserve topology and simplification may be sensitive to the order of
coordinates: two geometries differing only in order of coordinates may be
simplified differently.
Return a GeoSeries containing a simplified representation of
polygonal coverage.
Assumes that the GeoSeries forms a polygonal coverage. Under this
assumption, the method simplifies the edges using the Visvalingam-Whyatt
algorithm, while preserving a valid coverage. In the most simplified case,
polygons are reduced to triangles.
A GeoSeries of valid polygons is considered a coverage if the polygons are:
Non-overlapping - polygons do not overlap (their interiors do not
intersect)
Edge-Matched - vertices along shared edges are identical
The method allows simplification of all edges including the outer boundaries of
the coverage or simplification of only the inner (shared) edges.
If there are other geometry types than Polygons or MultiPolygons present, the
method will raise an error.
If the geometry is polygonal but does not form a valid coverage due to overlaps,
it will be simplified but it may result in invalid coverage topology.
Requires Shapely >= 2.1.
Added in version 1.1.0.
Parameters:
tolerance (float) – The degree of simplification roughly equal to the square root of the area
of triangles that will be removed. It has the same units
as the coordinate reference system of the GeoSeries.
For example, using tolerance=100 in a projected CRS with meters
as units means a distance of 100 meters in reality.
simplify_boundary (bool(defaultTrue)) – By default (True), simplifies both internal edges of the coverage as well
as its boundary. If set to False, only simplifies internal edges.
Creates R-tree spatial index based on shapely.STRtree.
Note that the spatial index may not be fully
initialized until the first use.
Examples
>>> fromshapely.geometryimportbox>>> s=geopandas.GeoSeries(geopandas.points_from_xy(range(5),range(5)))>>> s0 POINT (0 0)1 POINT (1 1)2 POINT (2 2)3 POINT (3 3)4 POINT (4 4)dtype: geometry
Query the spatial index with a single geometry based on the bounding box:
>>> s.sindex.query(box(1,1,3,3))array([1, 2, 3])
Query the spatial index with a single geometry based on the predicate:
’left’: use keys from left_df; retain only left_df geometry column
’right’: use keys from right_df; retain only right_df geometry column
’inner’: use intersection of keys from both dfs; retain only
left_df geometry column
predicate (string, default'intersects') – Binary predicate. Valid values are determined by the spatial index used.
You can check the valid values in left_df or right_df as
left_df.sindex.valid_query_predicates or
right_df.sindex.valid_query_predicates
lsuffix (string, default'left') – Suffix to apply to overlapping column names (left GeoDataFrame).
rsuffix (string, default'right') – Suffix to apply to overlapping column names (right GeoDataFrame).
distance (number or array_like, optional) – Distance(s) around each input geometry within which to query the tree
for the ‘dwithin’ predicate. If array_like, must be
one-dimesional with length equal to length of left GeoDataFrame.
Required if predicate='dwithin'.
on_attribute (string, list or tuple) – Column name(s) to join on as an additional join restriction on top
of the spatial predicate. These must be found in both DataFrames.
If set, observations are joined only if the predicate applies
and values in specified columns match.
>>> groceries.head() OBJECTID Ycoord ... Category geometry0 16 41.973266 ... NaN MULTIPOINT ((-87.65661 41.97321))1 18 41.696367 ... NaN MULTIPOINT ((-87.68136 41.69713))2 22 41.868634 ... NaN MULTIPOINT ((-87.63918 41.86847))3 23 41.877590 ... new MULTIPOINT ((-87.65495 41.87783))4 27 41.737696 ... NaN MULTIPOINT ((-87.62715 41.73623))[5 rows x 8 columns]
>>> groceries_w_communities=groceries.sjoin(chicago)>>> groceries_w_communities[["OBJECTID","community","geometry"]].head() OBJECTID community geometry0 16 UPTOWN MULTIPOINT ((-87.65661 41.97321))1 18 MORGAN PARK MULTIPOINT ((-87.68136 41.69713))2 22 NEAR WEST SIDE MULTIPOINT ((-87.63918 41.86847))3 23 NEAR WEST SIDE MULTIPOINT ((-87.65495 41.87783))4 27 CHATHAM MULTIPOINT ((-87.62715 41.73623))
Notes
Every operation in GeoPandas is planar, i.e. the potential third
dimension is not taken into account.
’left’: use keys from left_df; retain only left_df geometry column
’right’: use keys from right_df; retain only right_df geometry column
’inner’: use intersection of keys from both dfs; retain only
left_df geometry column
max_distance (float, defaultNone) – Maximum distance within which to query for nearest geometry.
Must be greater than 0.
The max_distance used to search for nearest items in the tree may have a
significant impact on performance by reducing the number of input
geometries that are evaluated for nearest items in the tree.
lsuffix (string, default'left') – Suffix to apply to overlapping column names (left GeoDataFrame).
rsuffix (string, default'right') – Suffix to apply to overlapping column names (right GeoDataFrame).
distance_col (string, defaultNone) – If set, save the distances computed between matching geometries under a
column of this name in the joined GeoDataFrame.
exclusive (bool, optional, defaultFalse) – If True, the nearest geometries that are equal to the input geometry
will not be returned, default False.
>>> groceries.head() OBJECTID Ycoord ... Category geometry0 16 41.973266 ... NaN MULTIPOINT ((-87.65661 41.97321))1 18 41.696367 ... NaN MULTIPOINT ((-87.68136 41.69713))2 22 41.868634 ... NaN MULTIPOINT ((-87.63918 41.86847))3 23 41.877590 ... new MULTIPOINT ((-87.65495 41.87783))4 27 41.737696 ... NaN MULTIPOINT ((-87.62715 41.73623))[5 rows x 8 columns]
>>> groceries_w_communities=groceries.sjoin_nearest(chicago)>>> groceries_w_communities[["Chain","community","geometry"]].head(2) Chain community geometry0 VIET HOA PLAZA UPTOWN MULTIPOINT ((1168268.672 1933554.35))1 COUNTY FAIR FOODS MORGAN PARK MULTIPOINT ((1162302.618 1832900.224))
To include the distances:
>>> groceries_w_communities=groceries.sjoin_nearest(chicago,distance_col="distances")>>> groceries_w_communities[["Chain","community","distances"]].head(2) Chain community distances0 VIET HOA PLAZA UPTOWN 0.01 COUNTY FAIR FOODS MORGAN PARK 0.0
In the following example, we get multiple groceries for Uptown because all
results are equidistant (in this case zero because they intersect).
In fact, we get 4 results in total:
>>> chicago_w_groceries=groceries.sjoin_nearest(chicago,distance_col="distances",how="right")>>> uptown_results=chicago_w_groceries[chicago_w_groceries["community"]=="UPTOWN"]>>> uptown_results[["Chain","community"]] Chain community30 VIET HOA PLAZA UPTOWN30 JEWEL OSCO UPTOWN30 TARGET UPTOWN30 Mariano's UPTOWN
xs (float, float) – The shear angle(s) for the x and y axes respectively. These can be
specified in either degrees (default) or radians by setting
use_radians=True.
ys (float, float) – The shear angle(s) for the x and y axes respectively. These can be
specified in either degrees (default) or radians by setting
use_radians=True.
origin (string, Point, or tuple(x, y)) – The point of origin can be a keyword ‘center’ for the bounding box
center (default), ‘centroid’ for the geometry’s centroid, a Point
object or a coordinate tuple (x, y).
use_radians (boolean) – Whether to interpret the shear angle(s) as degrees or radians
Snap the vertices and segments of the geometry to vertices of the reference.
Vertices and segments of the input geometry are snapped to vertices of the
reference geometry, returning a new geometry; the input geometries are not
modified. The result geometry is the input geometry with the vertices and
segments snapped. If no snapping occurs then the input geometry is returned
unchanged. The tolerance is used to control where snapping is performed.
Where possible, this operation tries to avoid creating invalid geometries;
however, it does not guarantee that output geometries will be valid. It is
the responsibility of the caller to check for and handle invalid geometries.
Because too much snapping can result in invalid geometries being created,
heuristics are used to determine the number and location of snapped
vertices that are likely safe to snap. These heuristics may omit
some potential snaps that are otherwise within the tolerance.
The operation works in a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The Geoseries (elementwise) or geometric object to snap to.
tolerance (float or arraylike) – Maximum distance between vertices that shall be snapped
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also snap two GeoSeries to each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and snap elements with the same index using
align=True or ignore index and snap elements based on their matching
order using align=False:
Returns a new DataFrame sorted by label if inplace argument is
False, otherwise updates the original DataFrame and returns None.
Parameters:
axis ({0or'index',1or'columns'}, default0) – The axis along which to sort. The value 0 identifies the rows,
and 1 identifies the columns.
level (int or levelname or list of ints or list of levelnames) – If not None, sort on values in specified index level(s).
ascending (bool or list-like of bools, defaultTrue) – Sort ascending vs. descending. When the index is a MultiIndex the
sort direction can be controlled for each level individually.
inplace (bool, defaultFalse) – Whether to modify the DataFrame rather than creating a new one.
kind ({'quicksort','mergesort','heapsort','stable'}, default'quicksort') – Choice of sorting algorithm. See also numpy.sort() for more
information. mergesort and stable are the only stable algorithms. For
DataFrames, this option is only applied when sorting on a single
column or label.
na_position ({'first','last'}, default'last') – Puts NaNs at the beginning if first; last puts NaNs at the end.
Not implemented for MultiIndex.
sort_remaining (bool, defaultTrue) – If True and sorting by level and index is multilevel, sort by other
levels too (in order) after sorting by specified level.
ignore_index (bool, defaultFalse) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
key (callable, optional) – If not None, apply the key function to the index values
before sorting. This is similar to the key argument in the
builtin sorted() function, with the notable difference that
this key function should be vectorized. It should expect an
Index and return an Index of the same shape. For MultiIndex
inputs, the key is applied per level.
Returns:
The original DataFrame sorted by the labels or None if inplace=True.
if axis is 0 or ‘index’ then by may contain index
levels and/or column labels.
if axis is 1 or ‘columns’ then by may contain column
levels and/or index labels.
axis ("{0or'index',1or'columns'}", default0) – Axis to be sorted.
ascending (bool or list of bool, defaultTrue) – Sort ascending vs. descending. Specify list for multiple sort
orders. If this is a list of bools, must match the length of
the by.
inplace (bool, defaultFalse) – If True, perform operation in-place.
kind ({'quicksort','mergesort','heapsort','stable'}, default'quicksort') – Choice of sorting algorithm. See also numpy.sort() for more
information. mergesort and stable are the only stable algorithms. For
DataFrames, this option is only applied when sorting on a single
column or label.
na_position ({'first','last'}, default'last') – Puts NaNs at the beginning if first; last puts NaNs at the
end.
ignore_index (bool, defaultFalse) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
key (callable, optional) – Apply the key function to the values
before sorting. This is similar to the key argument in the
builtin sorted() function, with the notable difference that
this key function should be vectorized. It should expect a
Series and return a Series with the same shape as the input.
It will be applied to each column in by independently.
Returns:
DataFrame with sorted values or None if inplace=True.
Series or DataFrames with a single element are squeezed to a scalar.
DataFrames with a single column or a single row are squeezed to a
Series. Otherwise the object is unchanged.
This method is most useful when you don’t know if your
object is a Series or DataFrame, but you do know it has just a single
column. In that case you can safely call squeeze to ensure you have a
Series.
Parameters:
axis ({0or'index',1or'columns',None}, defaultNone) – A specific axis to squeeze. By default, all length-1 axes are
squeezed. For Series this parameter is unused and defaults to None.
Returns:
The projection after squeezing axis or all the axes.
Return type:
DataFrame, Series, or scalar
See also
Series.iloc
Integer-location based indexing for selecting scalars.
DataFrame.iloc
Integer-location based indexing for selecting Series.
Series.to_frame
Inverse of DataFrame.squeeze for a single-column DataFrame.
Examples
>>> primes=pd.Series([2,3,5,7])
Slicing might produce a Series with a single value:
Stack the prescribed level(s) from columns to index.
Return a reshaped DataFrame or Series having a multi-level
index with one or more new inner-most levels compared to the current
DataFrame. The new inner-most levels are created by pivoting the
columns of the current dataframe:
if the columns have a single level, the output is a Series;
if the columns have multiple levels, the new index
level(s) is (are) taken from the prescribed level(s) and
the output is a DataFrame.
Parameters:
level (int, str, list, default-1) – Level(s) to stack from the column axis onto the index
axis, defined as one index or label, or a list of indices
or labels.
dropna (bool, defaultTrue) – Whether to drop rows in the resulting Frame/Series with
missing values. Stacking a column level onto the index
axis can create combinations of index and column values
that are missing from the original dataframe. See Examples
section.
sort (bool, defaultTrue) – Whether to sort the levels of the resulting MultiIndex.
future_stack (bool, defaultFalse) – Whether to use the new implementation that will replace the current
implementation in pandas 3.0. When True, dropna and sort have no impact
on the result and must remain unspecified. See pandas 2.1.0 Release
notes for more details.
Returns:
Stacked dataframe or series.
Return type:
DataFrame or Series
See also
DataFrame.unstack
Unstack prescribed level(s) from index axis onto column axis.
DataFrame.pivot
Reshape dataframe from long format to wide format.
DataFrame.pivot_table
Create a spreadsheet-style pivot table as a DataFrame.
Notes
The function is named by analogy with a collection of books
being reorganized from being side by side on a horizontal
position (the columns of the dataframe) to being stacked
vertically on top of each other (in the index of the
dataframe).
It is common to have missing values when stacking a dataframe
with multi-level columns, as the stacked dataframe typically
has more values than the original dataframe. Missing values
are filled with NaNs:
>>> df_multi_level_cols2 weight height kg mcat 1.0 2.0dog 3.0 4.0>>> df_multi_level_cols2.stack(future_stack=True) weight heightcat kg 1.0 NaN m NaN 2.0dog kg 3.0 NaN m NaN 4.0
Prescribing the level(s) to be stacked
The first parameter controls which level or levels are stacked:
>>> df_multi_level_cols2.stack(0,future_stack=True) kg mcat weight 1.0 NaN height NaN 2.0dog weight 3.0 NaN height NaN 4.0>>> df_multi_level_cols2.stack([0,1],future_stack=True)cat weight kg 1.0 height m 2.0dog weight kg 3.0 height m 4.0dtype: float64
Return sample standard deviation over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument.
Parameters:
axis ({index(0),columns(1)}) –
For Series this parameter is unused and defaults to 0.
Warning
The behavior of DataFrame.std with axis=None is deprecated,
in a future version this will reduce over both axes and return a scalar
To retain the old behavior, pass axis=0 (or do not pass axis).
skipna (bool, defaultTrue) – Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
ddof (int, default1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
where N represents the number of elements.
numeric_only (bool, defaultFalse) – Include only float, int, boolean columns. Not implemented for Series.
Return type:
Series or DataFrame(iflevelspecified)
Notes
To have the same behaviour as numpy.std, use ddof=0 (instead of the
default ddof=1)
Get Subtraction of dataframe and other, element-wise (binary operator sub).
Equivalent to dataframe-other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rsub.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Get Subtraction of dataframe and other, element-wise (binary operator sub).
Equivalent to dataframe-other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rsub.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Return the sum of the values over the requested axis.
This is equivalent to the method numpy.sum.
Parameters:
axis ({index(0),columns(1)}) –
Axis for the function to be applied on.
For Series this parameter is unused and defaults to 0.
Warning
The behavior of DataFrame.sum with axis=None is deprecated,
in a future version this will reduce over both axes and return a scalar
To retain the old behavior, pass axis=0 (or do not pass axis).
Added in version 2.0.0.
skipna (bool, defaultTrue) – Exclude NA/null values when computing the result.
numeric_only (bool, defaultFalse) – Include only float, int, boolean columns. Not implemented for Series.
min_count (int, default0) – The required number of valid values to perform the operation. If fewer than
min_count non-NA values are present the result will be NA.
**kwargs – Additional keyword arguments to be passed to the function.
Return type:
Series or scalar
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
Examples
>>> idx=pd.MultiIndex.from_arrays([... ['warm','warm','cold','cold'],... ['dog','falcon','fish','spider']],... names=['blooded','animal'])>>> s=pd.Series([4,2,0,8],name='legs',index=idx)>>> sblooded animalwarm dog 4 falcon 2cold fish 0 spider 8Name: legs, dtype: int64
>>> s.sum()14
By default, the sum of an empty or all-NA Series is 0.
>>> pd.Series([],dtype="float64").sum()# min_count=0 is the default0.0
This can be controlled with the min_count parameter. For example, if
you’d like the sum of an empty series to be NaN, pass min_count=1.
Default is to swap the two innermost levels of the index.
Parameters:
i (int or str) – Levels of the indices to be swapped. Can pass level name as string.
j (int or str) – Levels of the indices to be swapped. Can pass level name as string.
axis ({0or'index',1or'columns'}, default0) – The axis to swap levels on. 0 or ‘index’ for row-wise, 1 or
‘columns’ for column-wise.
Returns:
DataFrame with levels swapped in MultiIndex.
Return type:
DataFrame
Examples
>>> df=pd.DataFrame(... {"Grade":["A","B","A","C"]},... index=[... ["Final exam","Final exam","Coursework","Coursework"],... ["History","Geography","History","Geography"],... ["January","February","March","April"],... ],... )>>> df GradeFinal exam History January A Geography February BCoursework History March A Geography April C
In the following example, we will swap the levels of the indices.
Here, we will swap the levels column-wise, but levels can be swapped row-wise
in a similar manner. Note that column-wise is the default behaviour.
By not supplying any arguments for i and j, we swap the last and second to
last indices.
>>> df.swaplevel() GradeFinal exam January History A February Geography BCoursework March History A April Geography C
By supplying one argument, we can choose which index to swap the last
index with. We can for example swap the first index with the last one as
follows.
>>> df.swaplevel(0) GradeJanuary History Final exam AFebruary Geography Final exam BMarch History Coursework AApril Geography Coursework C
We can also define explicitly which indices we want to swap by supplying values
for both i and j. Here, we for example swap the first and second indices.
>>> df.swaplevel(0,1) GradeHistory Final exam January AGeography Final exam February BHistory Coursework March AGeography Coursework April C
Return a GeoSeries of the symmetric difference of points in
each aligned geometry with other.
For each geometry, the symmetric difference consists of points in the
geometry not in other, and points in other not in the geometry.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (Geoseries or geometricobject) – The Geoseries (elementwise) or geometric object to find the
symmetric difference to.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
This function returns last n rows from the object based on
position. It is useful for quickly verifying data, for example,
after sorting or appending rows.
For negative values of n, this function returns all rows except
the first |n| rows, equivalent to df[|n|:].
If n is larger than the number of rows, this function returns all rows.
Return the elements in the given positional indices along an axis.
This means that we are not indexing according to actual values in
the index attribute of the object. We are indexing according to the
actual position of the element in the object.
Parameters:
indices (array-like) – An array of ints indicating which positions to take.
axis ({0or'index',1or'columns',None}, default0) – The axis on which to select elements. 0 means that we are
selecting rows, 1 means that we are selecting columns.
For Series this parameter is unused and defaults to 0.
**kwargs – For compatibility with numpy.take(). Has no effect on the
output.
Returns:
An array-like containing the elements taken from the object.
>>> df=pd.DataFrame([('falcon','bird',389.0),... ('parrot','bird',24.0),... ('lion','mammal',80.5),... ('monkey','mammal',np.nan)],... columns=['name','class','max_speed'],... index=[0,2,3,1])>>> df name class max_speed0 falcon bird 389.02 parrot bird 24.03 lion mammal 80.51 monkey mammal NaN
Take elements at positions 0 and 3 along the axis 0 (default).
Note how the actual indices selected (0 and 1) do not correspond to
our selected indices 0 and 3. That’s because we are selecting the 0th
and 3rd rows, not rows whose indices equal 0 and 3.
>>> df.take([0,3]) name class max_speed0 falcon bird 389.01 monkey mammal NaN
Take elements at indices 1 and 2 along the axis 1 (column selection).
>>> df.take([1,2],axis=1) class max_speed0 bird 389.02 bird 24.03 mammal 80.51 mammal NaN
We may take elements using negative integers for positive indices,
starting from the end of the object, just like with Python lists.
>>> df.take([-1,-2]) name class max_speed1 monkey mammal NaN3 lion mammal 80.5
This function returns a generic Arrow data object implementing
the Arrow PyCapsule Protocol (i.e. having an __arrow_c_stream__
method). This object can then be consumed by your Arrow implementation
of choice that supports this protocol.
Added in version 1.0.
Parameters:
index (bool, defaultNone) – If True, always include the dataframe’s index(es) as columns
in the file output.
If False, the index(es) will not be written to the file.
If None, the index(ex) will be included as columns in the file
output except RangeIndex which is stored as metadata only.
geometry_encoding ({'WKB','geoarrow'}, default'WKB') – The GeoArrow encoding to use for the data conversion.
interleaved (bool, defaultTrue) – Only relevant for ‘geoarrow’ encoding. If True, the geometries’
coordinates are interleaved in a single fixed size list array.
If False, the coordinates are stored as separate arrays in a
struct type.
include_z (bool, defaultNone) – Only relevant for ‘geoarrow’ encoding (for WKB, the dimensionality
of the individial geometries is preserved).
If False, return 2D geometries. If True, include the third dimension
in the output (if a geometry has no third dimension, the z-coordinates
will be NaN). By default, will infer the dimensionality from the
input geometries. Note that this inference can be unreliable with
empty geometries (for a guaranteed result, it is recommended to
specify the keyword).
Returns:
A generic Arrow table object with geometry columns encoded to
GeoArrow.
Return type:
ArrowTable
Examples
>>> fromshapely.geometryimportPoint>>> data={'col1':['name1','name2'],'geometry':[Point(1,2),Point(2,1)]}>>> gdf=geopandas.GeoDataFrame(data)>>> gdf col1 geometry0 name1 POINT (1 2)1 name2 POINT (2 1)
>>> arrow_table=gdf.to_arrow()>>> arrow_table<geopandas.io._geoarrow.ArrowTable object at ...>
The returned data object needs to be consumed by a library implementing
the Arrow PyCapsule Protocol. For example, wrapping the data as a
pyarrow.Table (requires pyarrow >= 14.0):
Transform geometries to a new coordinate reference system.
Transform all geometries in an active geometry column to a different coordinate
reference system. The crs attribute on the current GeoSeries must
be set. Either crs or epsg may be specified for output.
This method will transform all points in all objects. It has no notion
of projecting entire geometries. All segments joining points are
assumed to be lines in the current projection, not geodesics. Objects
crossing the dateline (or other projection boundary) will have
undesirable behavior.
Parameters:
crs (pyproj.CRS, optionalif`epsg is specified`) – The value can be anything accepted by
pyproj.CRS.from_user_input(),
such as an authority string (eg “EPSG:4326”) or a WKT string.
Write object to a comma-separated values (csv) file.
Parameters:
path_or_buf (str, pathobject, file-likeobject, or None, defaultNone) – String, path object (implementing os.PathLike[str]), or file-like
object implementing a write() function. If None, the result is
returned as a string. If a non-binary file object is passed, it should
be opened with newline=’’, disabling universal newlines. If a binary
file object is passed, mode might need to contain a ‘b’.
sep (str, default',') – String of length 1. Field delimiter for the output file.
na_rep (str, default'') – Missing data representation.
float_format (str, Callable, defaultNone) – Format string for floating point numbers. If a Callable is given, it takes
precedence over other numeric formatting parameters, like decimal.
columns (sequence, optional) – Columns to write.
header (bool or list of str, defaultTrue) – Write out the column names. If a list of strings is given it is
assumed to be aliases for the column names.
index (bool, defaultTrue) – Write row names (index).
index_label (str or sequence, or False, defaultNone) – Column label for index column(s) if desired. If None is given, and
header and index are True, then the index names are used. A
sequence should be given if the object uses MultiIndex. If
False do not print fields for index names. Use index_label=False
for easier importing in R.
mode ({'w','x','a'}, default'w') –
Forwarded to either open(mode=) or fsspec.open(mode=) to control
the file opening. Typical values include:
’w’, truncate the file first.
’x’, exclusive creation, failing if the file already exists.
’a’, append to the end of file if it exists.
encoding (str, optional) – A string representing the encoding to use in the output file,
defaults to ‘utf-8’. encoding is not supported if path_or_buf
is a non-binary file object.
For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is
path-like, then detect compression from the following extensions: ‘.gz’,
‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’
(otherwise no compression).
Set to None for no compression.
Can also be a dict with key 'method' set
to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and
other key-value pairs are forwarded to
zipfile.ZipFile, gzip.GzipFile,
bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or
tarfile.TarFile, respectively.
As an example, the following could be passed for faster compression and to create
a reproducible gzip archive:
compression={'method':'gzip','compresslevel':1,'mtime':1}.
Added in version 1.5.0: Added support for .tar files.
May be a dict with key ‘method’ as compression mode
and other entries as additional compression options if
compression mode is ‘zip’.
Passing compression options as keys in dict is
supported for compression modes ‘gzip’, ‘bz2’, ‘zstd’, and ‘zip’.
quoting (optionalconstantfromcsvmodule) – Defaults to csv.QUOTE_MINIMAL. If you have set a float_format
then floats are converted to strings and thus csv.QUOTE_NONNUMERIC
will treat them as non-numeric.
quotechar (str, default'\"') – String of length 1. Character used to quote fields.
The newline character or character sequence to use in the output
file. Defaults to os.linesep, which depends on the OS in which
this method is called (’\n’ for linux, ‘\r\n’ for Windows, i.e.).
Changed in version 1.5.0: Previously was line_terminator, changed for consistency with
read_csv and the standard library ‘csv’ module.
chunksize (int or None) – Rows to write at a time.
date_format (str, defaultNone) – Format string for datetime objects.
doublequote (bool, defaultTrue) – Control quoting of quotechar inside a field.
escapechar (str, defaultNone) – String of length 1. Character used to escape sep and quotechar
when appropriate.
decimal (str, default'.') – Character recognized as decimal separator. E.g. use ‘,’ for
European data.
errors (str, default'strict') – Specifies how encoding and decoding errors are to be handled.
See the errors argument for open() for a full list
of options.
storage_options (dict, optional) – Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to urllib.request.Request as header options. For other
URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are
forwarded to fsspec.open. Please see fsspec and urllib for more
details, and for more examples on storage options refer here.
Returns:
If path_or_buf is None, returns the resulting csv format as a
string. Otherwise returns None.
’records’ : list like
[{column -> value}, … , {column -> value}]
’index’ : dict like {index -> {column -> value}}
Added in version 1.4.0: ‘tight’ as an allowed value for the orient argument
into (class, defaultdict) – The collections.abc.MutableMapping subclass used for all Mappings
in the return value. Can be the actual class or an empty
instance of the mapping type you want. If you want a
collections.defaultdict, you must pass it initialized.
Whether to include the index item (and index_names item if orient
is ‘tight’) in the returned dictionary. Can only be False
when orient is ‘split’ or ‘tight’.
Added in version 2.0.0.
Returns:
Return a collections.abc.MutableMapping object representing the
DataFrame. The resulting transformation depends on the orient
parameter.
To write a single object to an Excel .xlsx file it is only necessary to
specify a target file name. To write to multiple sheets it is necessary to
create an ExcelWriter object with a target file name, and specify a sheet
in the file to write to.
Multiple sheets may be written to by specifying unique sheet_name.
With all data written to the file it is necessary to save the changes.
Note that creating an ExcelWriter object with a file name that already
exists will result in the contents of the existing file being erased.
Parameters:
excel_writer (path-like, file-like, or ExcelWriterobject) – File path or existing ExcelWriter.
sheet_name (str, default'Sheet1') – Name of sheet which will contain DataFrame.
na_rep (str, default'') – Missing data representation.
float_format (str, optional) – Format string for floating point numbers. For example
float_format="%.2f" will format 0.1234 to 0.12.
columns (sequence or list of str, optional) – Columns to write.
header (bool or list of str, defaultTrue) – Write out the column names. If a list of string is given it is
assumed to be aliases for the column names.
index (bool, defaultTrue) – Write row names (index).
index_label (str or sequence, optional) – Column label for index column(s) if desired. If not specified, and
header and index are True, then the index names are used. A
sequence should be given if the DataFrame uses MultiIndex.
startrow (int, default0) – Upper left cell row to dump data frame.
startcol (int, default0) – Upper left cell column to dump data frame.
engine (str, optional) – Write engine to use, ‘openpyxl’ or ‘xlsxwriter’. You can also set this
via the options io.excel.xlsx.writer or
io.excel.xlsm.writer.
merge_cells (bool, defaultTrue) – Write MultiIndex and Hierarchical Rows as merged cells.
inf_rep (str, default'inf') – Representation for infinity (there is no native representation for
infinity in Excel).
freeze_panes (tuple of int(length2), optional) – Specifies the one-based bottommost row and rightmost column that
is to be frozen.
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to urllib.request.Request as header options. For other
URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are
forwarded to fsspec.open. Please see fsspec and urllib for more
details, and for more examples on storage options refer here.
To set the library that is used to write the Excel file,
you can pass the engine keyword (the default engine is
automatically chosen depending on the file extension):
index (bool, defaultNone) – If True, always include the dataframe’s index(es) as columns
in the file output.
If False, the index(es) will not be written to the file.
If None, the index(ex) will be included as columns in the file
output except RangeIndex which is stored as metadata only.
compression ({'zstd','lz4','uncompressed'}, optional) – Name of the compression to use. Use "uncompressed" for no
compression. By default uses LZ4 if available, otherwise uncompressed.
schema_version ({'0.1.0','0.4.0','1.0.0',None}) – GeoParquet specification version; if not provided will default to
latest supported version.
kwargs – Additional keyword arguments passed to
pyarrow.feather.write_feather().
By default, an ESRI shapefile is written, but any OGR data source
supported by Pyogrio or Fiona can be written. A dictionary of supported OGR
providers is available via:
>>> importpyogrio>>> pyogrio.list_drivers()
Parameters:
filename (string) – File path or file handle to write to. The path may specify a
GDAL VSI scheme.
driver (string, defaultNone) – The OGR format driver used to write the vector file.
If not specified, it attempts to infer it from the file extension.
If no extension is specified, it saves ESRI Shapefile to a folder.
schema (dict, defaultNone) – If specified, the schema dictionary is passed to Fiona to
better control how the file is written. If None, GeoPandas
will determine the schema based on each column’s dtype.
Not supported for the “pyogrio” engine.
If True, write index into one or more columns (for MultiIndex).
Default None writes the index into one or more columns only if
the index is named, is a MultiIndex, or has a non-integer data
type. If False, no index is written.
Added in version 0.7: Previously the index was not written.
mode (string, default'w') – The write mode, ‘w’ to overwrite the existing file and ‘a’ to append.
Not all drivers support appending. The drivers that support appending
are listed in fiona.supported_drivers or
https://github.com/Toblerity/Fiona/blob/master/fiona/drvsupport.py
crs (pyproj.CRS, defaultNone) – If specified, the CRS is passed to Fiona to
better control how the file is written. If None, GeoPandas
will determine the crs based on crs df attribute.
The value can be anything accepted
by pyproj.CRS.from_user_input(),
such as an authority string (eg “EPSG:4326”) or a WKT string. The keyword
is not supported for the “pyogrio” engine.
engine (str, "pyogrio" or "fiona") – The underlying library that is used to write the file. Currently, the
supported options are “pyogrio” and “fiona”. Defaults to “pyogrio” if
installed, otherwise tries “fiona”.
metadata (dict[str, str], defaultNone) – Optional metadata to be stored in the file. Keys and values must be
strings. Supported only for “GPKG” driver.
**kwargs – Keyword args to be passed to the engine, and can be used to write
to multi-layer data, store data within archives (zip files), etc.
In case of the “pyogrio” engine, the keyword arguments are passed to
pyogrio.write_dataframe. In case of the “fiona” engine, the keyword
arguments are passed to fiona.open`. For more information on possible
keywords, type: importpyogrio;help(pyogrio.write_dataframe).
Notes
The format drivers will attempt to detect the encoding of your data, but
may fail. In this case, the proper encoding can be specified explicitly
by using the encoding keyword parameter, e.g. encoding='utf-8'.
List of BigQuery table fields to which according DataFrame
columns conform to, e.g. [{'name':'col1','type':'STRING'},...]. If schema is not provided, it will be
generated according to dtypes of DataFrame columns. See
BigQuery API documentation on available names of a field.
Location where the load job should run. See the BigQuery locations
documentation for a
list of available locations. The location must match that of the
target dataset.
Credentials for accessing Google APIs. Use this parameter to
override default credentials, such as to use Compute Engine
google.auth.compute_engine.Credentials or Service
Account google.oauth2.service_account.Credentials
directly.
Return a python feature collection representation of the GeoDataFrame
as a dictionary with a list of features based on the __geo_interface__
GeoJSON-like specification.
Options are {‘null’, ‘drop’, ‘keep’}, default ‘null’.
Indicates how to output missing (NaN) values in the GeoDataFrame
null: output the missing entries as JSON null
drop: remove the property from the feature. This applies to each feature individually so that features may have different properties
keep: output the missing entries as NaN
show_bbox (bool, optional) – Include bbox (bounds) in the geojson. Default False.
drop_id (bool, default: False) – Whether to retain the index of the GeoDataFrame as the id property
in the generated dictionary. Default is False, but may want True
if the index is just arbitrary row numbers.
Write the contained data to an HDF5 file using HDFStore.
Hierarchical Data Format (HDF) is self-describing, allowing an
application to interpret the structure and contents of a file with
no outside information. One HDF file can hold a mix of related objects
which can be accessed as a group or as individual objects.
In order to add another DataFrame or Series to an existing HDF file
please use append mode and a different a key.
Warning
One can store a subclass of DataFrame or Series to HDF5,
but the type of the subclass is lost upon storing.
path_or_buf (str or pandas.HDFStore) – File path or HDFStore object.
key (str) – Identifier for the group in the store.
mode ({'a','w','r+'}, default'a') –
Mode to open file:
’w’: write, a new file is created (an existing file with
the same name would be deleted).
’a’: append, an existing file is opened for reading and
writing, and if the file does not exist it is created.
’r+’: similar to ‘a’, but the file must already exist.
complevel ({0-9}, defaultNone) – Specifies a compression level for data.
A value of 0 or None disables compression.
complib ({'zlib','lzo','bzip2','blosc'}, default'zlib') – Specifies the compression library to be used.
These additional compressors for Blosc are supported
(default if no compressor specified: ‘blosc:blosclz’):
{‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’,
‘blosc:zlib’, ‘blosc:zstd’}.
Specifying a compression library which is not available issues
a ValueError.
append (bool, defaultFalse) – For Table formats, append the input data to the existing.
format ({'fixed','table',None}, default'fixed') –
Possible values:
’fixed’: Fixed format. Fast writing/reading. Not-appendable,
nor searchable.
’table’: Table format. Write as a PyTables Table structure
which may perform worse but allow more flexible operations
like searching / selecting subsets of the data.
If None, pd.get_option(‘io.hdf.default_format’) is checked,
followed by fallback to “fixed”.
index (bool, defaultTrue) – Write DataFrame index as a column.
min_itemsize (dict or int, optional) – Map column names to minimum string sizes for columns.
nan_rep (Any, optional) – How to represent null values as str.
Not allowed with append=True.
data_columns (list of columns or True, optional) – List of columns to create as indexed data columns for on-disk
queries, or True to use all columns. By default only the axes
of the object are indexed. See
Query via data columns. for
more information.
Applicable only to format=’table’.
errors (str, default'strict') – Specifies how encoding and decoding errors are to be handled.
See the errors argument for open() for a full list
of options.
buf (str, Path or StringIO-like, optional, defaultNone) – Buffer to write to. If None, the output is returned as a string.
columns (array-like, optional, defaultNone) – The subset of columns to write. Writes all columns by default.
col_space (str or int, list or dict of int or str, optional) – The minimum width of each column in CSS length units. An int is assumed to be px units..
index (bool, optional, defaultTrue) – Whether to print index (row) labels.
na_rep (str, optional, default'NaN') – String representation of NaN to use.
formatters (list, tuple or dict of one-param.functions, optional) – Formatter functions to apply to columns’ elements by position or
name.
The result of each function must be a unicode string.
List/tuple must be of length equal to the number of columns.
float_format (one-parameterfunction, optional, defaultNone) – Formatter function to apply to columns’ elements if they are
floats. This function must return a unicode string and will be
applied only to the non-NaN elements, with NaN being
handled by na_rep.
sparsify (bool, optional, defaultTrue) – Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row.
index_names (bool, optional, defaultTrue) – Prints the names of the indexes.
How to justify the column labels. If None uses the option from
the print configuration (controlled by set_option), ‘right’ out
of the box. Valid values are
left
right
center
justify
justify-all
start
end
inherit
match-parent
initial
unset.
max_rows (int, optional) – Maximum number of rows to display in the console.
max_cols (int, optional) – Maximum number of columns to display in the console.
show_dimensions (bool, defaultFalse) – Display DataFrame dimensions (number of rows by number of columns).
decimal (str, default'.') – Character recognized as decimal separator, e.g. ‘,’ in Europe.
bold_rows (bool, defaultTrue) – Make the row labels bold in the output.
classes (str or list or tuple, defaultNone) – CSS class(es) to apply to the resulting html table.
escape (bool, defaultTrue) – Convert the characters <, >, and & to HTML-safe sequences.
notebook ({True,False}, defaultFalse) – Whether the generated HTML is for IPython Notebook.
border (int) – A border=border attribute is included in the opening
<table> tag. Default pd.options.display.html.border.
table_id (str, optional) – A css id is included in the opening <table> tag if specified.
render_links (bool, defaultFalse) – Convert URLs to HTML links.
encoding (str, default"utf-8") – Set character encoding.
Returns:
If buf is None, returns the result as a string. Otherwise returns
None.
Return a GeoJSON representation of the GeoDataFrame as a string.
Parameters:
na ({'null','drop','keep'}, default'null') – Indicates how to output missing (NaN) values in the GeoDataFrame.
See below.
show_bbox (bool, optional, default: False) – Include bbox (bounds) in the geojson
drop_id (bool, default: False) – Whether to retain the index of the GeoDataFrame as the id property
in the generated GeoJSON. Default is False, but may want True
if the index is just arbitrary row numbers.
to_wgs84 (bool, optional, default: False) – If the CRS is set on the active geometry column it is exported as
WGS84 (EPSG:4326) to meet the 2016 GeoJSON specification.
Set to True to force re-projection and set to False to ignore CRS. False by
default.
Missing (NaN) values in the GeoDataFrame can be represented as follows:
null: output the missing entries as JSON null.
drop: remove the property from the feature. This applies to each
feature individually so that features may have different properties.
keep: output the missing entries as NaN.
If the GeoDataFrame has a defined CRS, its definition will be included
in the output unless it is equal to WGS84 (default GeoJSON CRS) or not
possible to represent in the URN OGC format, or unless to_wgs84=True
is specified.
Examples
>>> fromshapely.geometryimportPoint>>> d={'col1':['name1','name2'],'geometry':[Point(1,2),Point(2,1)]}>>> gdf=geopandas.GeoDataFrame(d,crs="EPSG:3857")>>> gdf col1 geometry0 name1 POINT (1 2)1 name2 POINT (2 1)
Render object to a LaTeX tabular, longtable, or nested table.
Requires \usepackage{{booktabs}}. The output can be copy/pasted
into a main LaTeX document or read from an external file
with \input{{table.tex}}.
Changed in version 2.0.0: Refactored to use the Styler implementation via jinja2 templating.
Parameters:
buf (str, Path or StringIO-like, optional, defaultNone) – Buffer to write to. If None, the output is returned as a string.
columns (list of label, optional) – The subset of columns to write. Writes all columns by default.
header (bool or list of str, defaultTrue) – Write out the column names. If a list of strings is given,
it is assumed to be aliases for the column names.
index (bool, defaultTrue) – Write row names (index).
na_rep (str, default'NaN') – Missing data representation.
formatters (list of functions or dict of {{str:function}}, optional) – Formatter functions to apply to columns’ elements by position or
name. The result of each function must be a unicode string.
List must be of length equal to the number of columns.
float_format (one-parameterfunction or str, optional, defaultNone) – Formatter for floating point numbers. For example
float_format="%.2f" and float_format="{{:0.2f}}".format will
both result in 0.1234 being formatted as 0.12.
sparsify (bool, optional) – Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row. By default, the value will be
read from the config module.
index_names (bool, defaultTrue) – Prints the names of the indexes.
bold_rows (bool, defaultFalse) – Make the row labels bold in the output.
column_format (str, optional) – The columns format as specified in LaTeX table format e.g. ‘rcl’ for 3
columns. By default, ‘l’ will be used for all columns except
columns of numbers, which default to ‘r’.
Use a longtable environment instead of tabular. Requires
adding a usepackage{{longtable}} to your LaTeX preamble.
By default, the value will be read from the pandas config
module, and set to True if the option styler.latex.environment is
“longtable”.
Changed in version 2.0.0: The pandas option affecting this argument has changed.
By default, the value will be read from the pandas config
module and set to True if the option styler.format.escape is
“latex”. When set to False prevents from escaping latex special
characters in column names.
Changed in version 2.0.0: The pandas option affecting this argument has changed, as has the
default value to False.
encoding (str, optional) – A string representing the encoding to use in the output file,
defaults to ‘utf-8’.
decimal (str, default'.') – Character recognized as decimal separator, e.g. ‘,’ in Europe.
The alignment for multicolumns, similar to column_format
The default will be read from the config module, and is set as the option
styler.latex.multicol_align.
Changed in version 2.0.0: The pandas option affecting this argument has changed, as has the
default value to “r”.
Use multirow to enhance MultiIndex rows. Requires adding a
usepackage{{multirow}} to your LaTeX preamble. Will print
centered labels (instead of top-aligned) across the contained
rows, separating groups via clines. The default will be read
from the pandas config module, and is set as the option
styler.sparse.index.
Changed in version 2.0.0: The pandas option affecting this argument has changed, as has the
default value to True.
caption (str or tuple, optional) – Tuple (full_caption, short_caption),
which results in \caption[short_caption]{{full_caption}};
if a single string is passed, no short caption will be set.
label (str, optional) – The LaTeX label to be placed inside \label{{}} in the output.
This is used with \ref{{}} in the main .tex file.
position (str, optional) – The LaTeX positional argument for tables, to be placed after
\begin{{}} in the output.
Returns:
If buf is None, returns the result as a string. Otherwise returns None.
Render a DataFrame to LaTeX with conditional formatting.
DataFrame.to_string
Render a DataFrame to a console-friendly tabular output.
DataFrame.to_html
Render a DataFrame as an HTML table.
Notes
As of v2.0.0 this method has changed to use the Styler implementation as
part of Styler.to_latex() via jinja2 templating. This means
that jinja2 is a requirement, and needs to be installed, for this method
to function. It is advised that users switch to using Styler, since that
implementation is more frequently updated and contains much more
flexibility with the output.
Examples
Convert a general DataFrame to LaTeX with formatting:
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to urllib.request.Request as header options. For other
URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are
forwarded to fsspec.open. Please see fsspec and urllib for more
details, and for more examples on storage options refer here.
**kwargs – These parameters will be passed to tabulate.
By default, the dtype of the returned array will be the common NumPy
dtype of all types in the DataFrame. For example, if the dtypes are
float16 and float32, the results dtype will be float32.
This may require copying data and coercing values, which may be
expensive.
Parameters:
dtype (str or numpy.dtype, optional) – The dtype to pass to numpy.asarray().
copy (bool, defaultFalse) – Whether to ensure that the returned value is not a view on
another array. Note that copy=False does not ensure that
to_numpy() is no-copy. Rather, copy=True ensure that
a copy is made, even if not strictly necessary.
na_value (Any, optional) – The value to use for missing values. The default value depends
on dtype and the dtypes of the DataFrame columns.
path (str, file-likeobject or None, defaultNone) – If a string, it will be used as Root Directory path
when writing a partitioned dataset. By file-like object,
we refer to objects with a write() method, such as a file handle
(e.g. via builtin open function). If path is None,
a bytes object is returned.
engine ({'pyarrow'}, default'pyarrow') – ORC library to use.
index (bool, optional) – If True, include the dataframe’s index(es) in the file output.
If False, they will not be written to the file.
If None, similar to infer the dataframe’s index(es)
will be saved. However, instead of being saved as values,
the RangeIndex will be stored as a range in the metadata so it
doesn’t require much space and is faster. Other indexes will
be included as columns in the file output.
engine_kwargs (dict[str, Any] or None, defaultNone) – Additional keyword arguments passed to pyarrow.orc.write_table().
Return type:
bytesifnopathargumentisprovidedelseNone
Raises:
NotImplementedError – Dtype of one or more columns is category, unsigned integers, interval,
period or sparse.
index (bool, defaultNone) – If True, always include the dataframe’s index(es) as columns
in the file output.
If False, the index(es) will not be written to the file.
If None, the index(ex) will be included as columns in the file
output except RangeIndex which is stored as metadata only.
compression ({'snappy','gzip','brotli','lz4','zstd',None}, default'snappy') – Name of the compression to use. Use None for no compression.
geometry_encoding ({'WKB','geoarrow'}, default'WKB') – The encoding to use for the geometry columns. Defaults to “WKB”
for maximum interoperability. Specify “geoarrow” to use one of the
native GeoArrow-based single-geometry type encodings.
Note: the “geoarrow” option is part of the newer GeoParquet 1.1
specification, should be considered as experimental, and may not
be supported by all readers.
write_covering_bbox (bool, defaultFalse) – Writes the bounding box column for each row entry with column
name ‘bbox’. Writing a bbox column can be computationally
expensive, but allows you to specify a bbox in :
func:read_parquet for filtered reading.
Note: this bbox column is part of the newer GeoParquet 1.1
specification and should be considered as experimental. While
writing the column is backwards compatible, using it for filtering
may not be supported by all readers.
schema_version ({'0.1.0','0.4.0','1.0.0','1.1.0',None}) – GeoParquet specification version; if not provided, will default to
latest supported stable version (1.0.0).
kwargs – Additional keyword arguments passed to pyarrow.parquet.write_table().
If False then underlying input data is not copied.
Note
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
path (str, pathobject, or file-likeobject) – String, path object (implementing os.PathLike[str]), or file-like
object implementing a binary write() function. File path where
the pickled object will be stored.
For on-the-fly compression of the output data. If ‘infer’ and ‘path’ is
path-like, then detect compression from the following extensions: ‘.gz’,
‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’
(otherwise no compression).
Set to None for no compression.
Can also be a dict with key 'method' set
to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and
other key-value pairs are forwarded to
zipfile.ZipFile, gzip.GzipFile,
bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or
tarfile.TarFile, respectively.
As an example, the following could be passed for faster compression and to create
a reproducible gzip archive:
compression={'method':'gzip','compresslevel':1,'mtime':1}.
Added in version 1.5.0: Added support for .tar files.
Int which indicates which protocol should be used by the pickler,
default HIGHEST_PROTOCOL (see [1]_ paragraph 12.1.2). The possible
values are 0, 1, 2, 3, 4, 5. A negative value for the protocol
parameter is equivalent to setting its value to HIGHEST_PROTOCOL.
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to urllib.request.Request as header options. For other
URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are
forwarded to fsspec.open. Please see fsspec and urllib for more
details, and for more examples on storage options refer here.
Return type:
None
See also
read_pickle
Load pickled pandas object (or any object) from file.
replace: Drop the table before inserting new values.
append: Insert new values to the existing table.
schema (string, optional) – Specify the schema. If None, use default schema: ‘public’.
index (bool, defaultFalse) – Write DataFrame index as a column.
Uses index_label as the column name in the table.
index_label (string or sequence, defaultNone) – Column label for index column(s).
If None is given (default) and index is True,
then the index names are used.
chunksize (int, optional) – Rows will be written in batches of this size at a time.
By default, all rows will be written at once.
dtype (dict of columnname to SQLtype, defaultNone) – Specifying the datatype for columns.
The keys should be the column names and the values
should be the SQLAlchemy types.
Index will be included as the first field of the record array if
requested.
Parameters:
index (bool, defaultTrue) – Include index in resulting record array, stored in ‘index’
field or using the index label, if set.
column_dtypes (str, type, dict, defaultNone) – If a string or type, the data type to store all columns. If
a dictionary, a mapping of column names and indices (zero-indexed)
to specific data types.
If a string or type, the data type to store all index levels. If
a dictionary, a mapping of index level names and indices
(zero-indexed) to specific data types.
This mapping is applied only if index=True.
Returns:
NumPy ndarray with the DataFrame labels as fields and each row
of the DataFrame as entries.
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable. See here.
If passing a sqlalchemy.engine.Connection which is already in a transaction,
the transaction will not be committed. If passing a sqlite3.Connection,
it will not be possible to roll back the record insertion.
schema (str, optional) – Specify the schema (if database flavor supports this). If None, use
default schema.
replace: Drop the table before inserting new values.
append: Insert new values to the existing table.
index (bool, defaultTrue) – Write DataFrame index as a column. Uses index_label as the column
name in the table. Creates a table index for this column.
index_label (str or sequence, defaultNone) – Column label for index column(s). If None is given (default) and
index is True, then the index names are used.
A sequence should be given if the DataFrame uses MultiIndex.
chunksize (int, optional) – Specify the number of rows in each batch to be written at a time.
By default, all rows will be written at once.
dtype (dict or scalar, optional) – Specifying the datatype for columns. If a dictionary is used, the
keys should be the column names and the values should be the
SQLAlchemy types or strings for the sqlite3 legacy mode. If a
scalar is provided, it will be applied to all columns.
method ({None,'multi',callable}, optional) –
Controls the SQL insertion clause used:
None : Uses standard SQL INSERT clause (one per row).
’multi’: Pass multiple values in a single INSERT clause.
callable with signature (pd_table,conn,keys,data_iter).
Details and a sample callable implementation can be found in the
section insert method.
Returns:
Number of rows affected by to_sql. None is returned if the callable
passed into method does not return an integer number of rows.
The number of returned rows affected is the sum of the rowcount
attribute of sqlite3.Cursor or SQLAlchemy connectable which may not
reflect the exact number of written rows as stipulated in the
sqlite3 or
SQLAlchemy.
ValueError – When the table already exists and if_exists is ‘fail’ (the
default).
See also
read_sql
Read a DataFrame from a table.
Notes
Timezone aware datetime columns will be written as
Timestampwithtimezone type with SQLAlchemy if supported by the
database. Otherwise, the datetimes will be stored as timezone unaware
timestamps local to the original timezone.
Not all datastores support method="multi". Oracle, for example,
does not support multi-value insert.
Use method to define a callable insertion method to do nothing
if there’s a primary key conflict on a table in a PostgreSQL database.
>>> fromsqlalchemy.dialects.postgresqlimportinsert>>> definsert_on_conflict_nothing(table,conn,keys,data_iter):... # "a" is the primary key in "conflict_table"... data=[dict(zip(keys,row))forrowindata_iter]... stmt=insert(table.table).values(data).on_conflict_do_nothing(index_elements=["a"])... result=conn.execute(stmt)... returnresult.rowcount>>> df_conflict.to_sql(name="conflict_table",con=conn,if_exists="append",method=insert_on_conflict_nothing)0
For MySQL, a callable to update columns b and c if there’s a conflict
on a primary key.
Specify the dtype (especially useful for integers with missing values).
Notice that while pandas is forced to store the data as floating point,
the database supports nullable integers. When fetching the data with
Python, we get back integer scalars.
Writes the DataFrame to a Stata dataset file.
“dta” files contain a Stata dataset.
Parameters:
path (str, pathobject, or buffer) – String, path object (implementing os.PathLike[str]), or file-like
object implementing a binary write() function.
convert_dates (dict) – Dictionary mapping columns containing datetime types to stata
internal format to use when writing the dates. Options are ‘tc’,
‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either an integer
or a name. Datetime columns that do not have a conversion type
specified will be converted to ‘tc’. Raises NotImplementedError if
a datetime column has timezone information.
write_index (bool) – Write the index to Stata dataset.
byteorder (str) – Can be “>”, “<”, “little”, or “big”. default is sys.byteorder.
time_stamp (datetime) – A datetime to use as file creation date. Default is the current
time.
data_label (str, optional) – A label for the data set. Must be 80 characters or smaller.
variable_labels (dict) – Dictionary containing columns as keys and variable labels as
values. Each label must be 80 characters or smaller.
version ({114,117,118,119,None}, default114) –
Version to use in the output dta file. Set to None to let pandas
decide between 118 or 119 formats depending on the number of
columns in the frame. Version 114 can be read by Stata 10 and
later. Version 117 can be read by Stata 13 or later. Version 118
is supported in Stata 14 and later. Version 119 is supported in
Stata 15 and later. Version 114 limits string variables to 244
characters or fewer while versions 117 and later allow strings
with lengths up to 2,000,000 characters. Versions 118 and 119
support Unicode characters, and version 119 supports more than
32,767 variables.
Version 119 should usually only be used when the number of
variables exceeds the capacity of dta format 118. Exporting
smaller datasets in format 119 may have unintended consequences,
and, as of November 2020, Stata SE cannot read version 119 files.
convert_strl (list, optional) – List of column names to convert to string columns to Stata StrL
format. Only available if version is 117. Storing strings in the
StrL format can produce smaller dta files if strings have more than
8 characters and values are repeated.
For on-the-fly compression of the output data. If ‘infer’ and ‘path’ is
path-like, then detect compression from the following extensions: ‘.gz’,
‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’
(otherwise no compression).
Set to None for no compression.
Can also be a dict with key 'method' set
to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and
other key-value pairs are forwarded to
zipfile.ZipFile, gzip.GzipFile,
bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or
tarfile.TarFile, respectively.
As an example, the following could be passed for faster compression and to create
a reproducible gzip archive:
compression={'method':'gzip','compresslevel':1,'mtime':1}.
Added in version 1.5.0: Added support for .tar files.
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to urllib.request.Request as header options. For other
URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are
forwarded to fsspec.open. Please see fsspec and urllib for more
details, and for more examples on storage options refer here.
Dictionary containing columns as keys and dictionaries of column value
to labels as values. Labels for a single variable must be 32,000
characters or smaller.
Columns listed in convert_dates are neither datetime64[ns]
or datetime.datetime
* Column listed in convert_dates is not in DataFrame
* Categorical label contains more than 32,000 characters
Render a DataFrame to a console-friendly tabular output.
Parameters:
buf (str, Path or StringIO-like, optional, defaultNone) – Buffer to write to. If None, the output is returned as a string.
columns (array-like, optional, defaultNone) – The subset of columns to write. Writes all columns by default.
col_space (int, list or dict of int, optional) – The minimum width of each column. If a list of ints is given every integers corresponds with one column. If a dict is given, the key references the column, while the value defines the space to use..
header (bool or list of str, optional) – Write out the column names. If a list of columns is given, it is assumed to be aliases for the column names.
index (bool, optional, defaultTrue) – Whether to print index (row) labels.
na_rep (str, optional, default'NaN') – String representation of NaN to use.
formatters (list, tuple or dict of one-param.functions, optional) – Formatter functions to apply to columns’ elements by position or
name.
The result of each function must be a unicode string.
List/tuple must be of length equal to the number of columns.
float_format (one-parameterfunction, optional, defaultNone) – Formatter function to apply to columns’ elements if they are
floats. This function must return a unicode string and will be
applied only to the non-NaN elements, with NaN being
handled by na_rep.
sparsify (bool, optional, defaultTrue) – Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row.
index_names (bool, optional, defaultTrue) – Prints the names of the indexes.
How to justify the column labels. If None uses the option from
the print configuration (controlled by set_option), ‘right’ out
of the box. Valid values are
left
right
center
justify
justify-all
start
end
inherit
match-parent
initial
unset.
max_rows (int, optional) – Maximum number of rows to display in the console.
max_cols (int, optional) – Maximum number of columns to display in the console.
show_dimensions (bool, defaultFalse) – Display DataFrame dimensions (number of rows by number of columns).
decimal (str, default'.') – Character recognized as decimal separator, e.g. ‘,’ in Europe.
line_width (int, optional) – Width to wrap a line in characters.
min_rows (int, optional) – The number of rows to display in the console in a truncated repr
(when number of rows is above max_rows).
max_colwidth (int, optional) – Max width to truncate each column in characters. By default, no limit.
encoding (str, default"utf-8") – Set character encoding.
Returns:
If buf is None, returns the result as a string. Otherwise returns
None.
If False then underlying input data is not copied.
Note
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
path_or_buffer (str, pathobject, file-likeobject, or None, defaultNone) – String, path object (implementing os.PathLike[str]), or file-like
object implementing a write() function. If None, the result is returned
as a string.
index (bool, defaultTrue) – Whether to include index in XML document.
root_name (str, default'data') – The name of root element in XML document.
row_name (str, default'row') – The name of row element in XML document.
na_rep (str, optional) – Missing data representation.
attr_cols (list-like, optional) – List of columns to write as attributes in row element.
Hierarchical columns will be flattened with underscore
delimiting the different levels.
elem_cols (list-like, optional) – List of columns to write as children in row element. By default,
all columns output as children of row element. Hierarchical
columns will be flattened with underscore delimiting the
different levels.
All namespaces to be defined in root element. Keys of dict
should be prefix names and values of dict corresponding URIs.
Default namespaces should be given empty string key. For
example,
namespaces={"":"https://example.com"}
prefix (str, optional) – Namespace prefix to be used for every element and/or attribute
in document. This should be one of the keys in namespaces
dict.
encoding (str, default'utf-8') – Encoding of the resulting document.
xml_declaration (bool, defaultTrue) – Whether to include the XML declaration at start of document.
pretty_print (bool, defaultTrue) – Whether output should be pretty printed with indentation and
line breaks.
parser ({'lxml','etree'}, default'lxml') – Parser module to use for building of tree. Only ‘lxml’ and
‘etree’ are supported. With ‘lxml’, the ability to use XSLT
stylesheet is supported.
stylesheet (str, pathobject or file-likeobject, optional) – A URL, file-like object, or a raw string containing an XSLT
script used to transform the raw XML output. Script should use
layout of elements and attributes from original output. This
argument requires lxml to be installed. Only XSLT 1.0
scripts and not later versions is currently supported.
For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buffer’ is
path-like, then detect compression from the following extensions: ‘.gz’,
‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’
(otherwise no compression).
Set to None for no compression.
Can also be a dict with key 'method' set
to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and
other key-value pairs are forwarded to
zipfile.ZipFile, gzip.GzipFile,
bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or
tarfile.TarFile, respectively.
As an example, the following could be passed for faster compression and to create
a reproducible gzip archive:
compression={'method':'gzip','compresslevel':1,'mtime':1}.
Added in version 1.5.0: Added support for .tar files.
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to urllib.request.Request as header options. For other
URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are
forwarded to fsspec.open. Please see fsspec and urllib for more
details, and for more examples on storage options refer here.
Returns:
If io is None, returns the resulting XML format as a
string. Otherwise returns None.
Return a Series of dtype('bool') with value True for
each aligned geometry that touches other.
An object is said to touch other if it has at least one point in
common with other and its interior does not intersect with any part
of the other. Overlapping features therefore do not touch.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to test if is
touched.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Return a GeoSeries with the transformation function
applied to the geometry coordinates.
Parameters:
transformation (Callable) – A function that transforms a (N, 2) or (N, 3) ndarray of float64
to another (N,2) or (N, 3) ndarray of float64
include_z (bool, defaultFalse) – If True include the third dimension in the coordinates array that
is passed to the transformation function. If a geometry has no third
dimension, the z-coordinates passed to the function will be NaN.
Return type:
GeoSeries
Examples
>>> fromshapelyimportPoint,Polygon>>> s=geopandas.GeoSeries([Point(0,0)])>>> s.transform(lambdax:x+1)0 POINT (1 1)dtype: geometry
xoff (float, float, float) – Amount of offset along each dimension.
xoff, yoff, and zoff for translation along the x, y, and z
dimensions respectively.
yoff (float, float, float) – Amount of offset along each dimension.
xoff, yoff, and zoff for translation along the x, y, and z
dimensions respectively.
zoff (float, float, float) – Amount of offset along each dimension.
xoff, yoff, and zoff for translation along the x, y, and z
dimensions respectively.
Whether to copy the data after transposing, even for DataFrames
with a single dtype.
Note that a copy is always required for mixed dtype DataFrames,
or for DataFrames with any extension types.
Note
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
Transposing a DataFrame with mixed dtypes will result in a homogeneous
DataFrame with the object dtype. In such a case, a copy of the data
is always made.
Get Floating division of dataframe and other, element-wise (binary operator truediv).
Equivalent to dataframe/other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rtruediv.
Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
Parameters:
other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0or'index',1or'columns'}) – Whether to compare by the index (0 or ‘index’) or columns.
(1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value (float or None, defaultNone) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Truncate a Series or DataFrame before and after some index value.
This is a useful shorthand for boolean indexing based on index
values above or below certain thresholds.
Parameters:
before (date, str, int) – Truncate all rows before this index value.
after (date, str, int) – Truncate all rows after this index value.
axis ({0or'index',1or'columns'}, optional) – Axis to truncate. Truncates the index (rows) by default.
For Series this parameter is unused and defaults to 0.
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
If the index being truncated contains only datetime values,
before and after may be specified as strings instead of
Timestamps.
Examples
>>> df=pd.DataFrame({'A':['a','b','c','d','e'],... 'B':['f','g','h','i','j'],... 'C':['k','l','m','n','o']},... index=[1,2,3,4,5])>>> df A B C1 a f k2 b g l3 c h m4 d i n5 e j o
>>> df.truncate(before=2,after=4) A B C2 b g l3 c h m4 d i n
The columns of a DataFrame can be truncated.
>>> df.truncate(before="A",after="B",axis="columns") A B1 a f2 b g3 c h4 d i5 e j
For Series, only rows can be truncated.
>>> df['A'].truncate(before=2,after=4)2 b3 c4 dName: A, dtype: object
The index values in truncate can be datetimes or string
dates.
Because the index is a DatetimeIndex containing only dates, we can
specify before and after as strings. They will be coerced to
Timestamps before truncation.
Note that truncate assumes a 0 value for any unspecified time
component (midnight). This differs from partial string slicing, which
returns any partially matching dates.
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
The copy keyword will change behavior in pandas 3.0.
Copy-on-Write
will be enabled by default, which means that all methods with a
copy keyword will use a lazy copy mechanism to defer the copy and
ignore the copy keyword. The copy keyword will be removed in a
future version of pandas.
You can already get the future behavior and improvements through
enabling copy on write pd.options.mode.copy_on_write=True
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from
03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at
00:30:00 UTC and at 01:30:00 UTC. In such a situation, the
ambiguous parameter dictates how ambiguous times should be
handled.
’infer’ will attempt to infer fall dst-transition hours based on
order
bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
’NaT’ will return NaT where there are ambiguous times
’raise’ will raise an AmbiguousTimeError if there are ambiguous
times.
If the DST transition causes nonexistent times, you can shift these
dates forward or backward with a timedelta object or ‘shift_forward’
or ‘shift_backward’.
Return a GeoSeries of the union of points in each aligned geometry with
other.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (Geoseries or geometricobject) – The Geoseries (elementwise) or geometric object to find the union
with.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
Return a geometry containing the union of all geometries in the
GeoSeries.
By default, the unary union algorithm is used. If the geometries are
non-overlapping (forming a coverage), GeoPandas can use a significantly faster
algorithm to perform the union using the method="coverage" option.
Alternatively, for situations which can be divided into many disjoint subsets,
method="disjoint_subset" may be preferable.
Parameters:
method (str(default````”unary”:py:class:`)`) –
The method to use for the union. Options are:
"unary": use the unary union algorithm. This option is the most robust
but can be slow for large numbers of geometries (default).
"coverage": use the coverage union algorithm. This option is optimized
for non-overlapping polygons and can be significantly faster than the
unary union algorithm. However, it can produce invalid geometries if the
polygons overlap.
"disjoint_subset:: use the disjoint subset union algorithm. This
option is optimized for inputs that can be divided into subsets that do
not intersect. If there is only one such subset, performance can be
expected to be worse than "unary". Requires Shapely >= 2.1.
When grid size is specified, a fixed-precision space is used to perform the
union operations. This can be useful when unioning geometries that are not
perfectly snapped or to avoid geometries not being unioned because of
robustness issues.
The inputs are first snapped to a grid of the given size. When a line
segment of a geometry is within tolerance off a vertex of another geometry,
this vertex will be inserted in the line segment. Finally, the result
vertices are computed on the same grid. Is only supported for method"unary". If None, the highest precision of the inputs will be used.
Defaults to None.
>>> index=pd.MultiIndex.from_tuples([('one','a'),('one','b'),... ('two','a'),('two','b')])>>> s=pd.Series(np.arange(1.0,5.0),index=index)>>> sone a 1.0 b 2.0two a 3.0 b 4.0dtype: float64
>>> s.unstack(level=-1) a bone 1.0 2.0two 3.0 4.0
>>> s.unstack(level=0) one twoa 1.0 3.0b 2.0 4.0
>>> df=s.unstack(level=0)>>> df.unstack()one a 1.0 b 2.0two a 3.0 b 4.0dtype: float64
Modify in place using non-NA values from another DataFrame.
Aligns on indices. There is no return value.
Parameters:
other (DataFrame, or objectcoercibleintoaDataFrame) – Should have at least one matching index/column label
with the original DataFrame. If a Series is passed,
its name attribute must be set, and that will be
used as the column name to align with the original DataFrame.
join ({'left'}, default'left') – Only left join is implemented, keeping the index and columns of the
original object.
True: overwrite original DataFrame’s values
with values from other.
False: only update values that are NA in
the original DataFrame.
filter_func (callable(1d-array)->bool1d-array, optional) – Can choose to replace values other than NA. Return True for values
that should be updated.
errors ({'raise','ignore'}, default'ignore') – If ‘raise’, will raise a ValueError if the DataFrame and other
both contain non-NA data in the same place.
The DataFrame’s length does not increase as a result of the update,
only values at matching index/column labels are updated.
>>> df=pd.DataFrame({'A':['a','b','c'],... 'B':['x','y','z']})>>> new_df=pd.DataFrame({'B':['d','e','f','g','h','i']})>>> df.update(new_df)>>> df A B0 a d1 b e2 c f
>>> df=pd.DataFrame({'A':['a','b','c'],... 'B':['x','y','z']})>>> new_df=pd.DataFrame({'B':['d','f']},index=[0,2])>>> df.update(new_df)>>> df A B0 a d1 b y2 c f
For Series, its name attribute must be set.
>>> df=pd.DataFrame({'A':['a','b','c'],... 'B':['x','y','z']})>>> new_column=pd.Series(['d','e','f'],name='B')>>> df.update(new_column)>>> df A B0 a d1 b e2 c f
If other contains NaNs the corresponding values are not updated
in the original dataframe.
Don’t include counts of rows that contain NA values.
Added in version 1.3.0.
Return type:
Series
See also
Series.value_counts
Equivalent method on Series.
Notes
The returned Series will have a MultiIndex with one level per input
column but an Index (non-multi) for a single label. By default, rows
that contain any NA values are omitted from the result. By default,
the resulting Series will be in descending order so that the first
element is the most frequently-occurring row.
With dropna set to False we can also count rows with NA values.
>>> df=pd.DataFrame({'first_name':['John','Anne','John','Beth'],... 'middle_name':['Smith',pd.NA,pd.NA,'Louise']})>>> df first_name middle_name0 John Smith1 Anne <NA>2 John <NA>3 Beth Louise
>>> df.value_counts()first_name middle_nameBeth Louise 1John Smith 1Name: count, dtype: int64
>>> df.value_counts(dropna=False)first_name middle_nameAnne NaN 1Beth Louise 1John Smith 1 NaN 1Name: count, dtype: int64
The dtype will be a lower-common-denominator dtype (implicit
upcasting); that is to say if the dtypes (even of numeric types)
are mixed, the one that accommodates all will be chosen. Use this
with care if you are not dealing with the blocks.
e.g. If the dtypes are float16 and float32, dtype will be upcast to
float32. If dtypes are int32 and uint8, dtype will be upcast to
int32. By numpy.find_common_type() convention, mixing int64
and uint64 will result in a float64 dtype.
Examples
A DataFrame where all columns are the same type (e.g., int64) results
in an array of the same type.
A DataFrame with mixed type columns(e.g., str/object, int64, float32)
results in an ndarray of the broadest type that accommodates these
mixed types (e.g., object).
Normalized by N-1 by default. This can be changed using the ddof argument.
Parameters:
axis ({index(0),columns(1)}) –
For Series this parameter is unused and defaults to 0.
Warning
The behavior of DataFrame.var with axis=None is deprecated,
in a future version this will reduce over both axes and return a scalar
To retain the old behavior, pass axis=0 (or do not pass axis).
skipna (bool, defaultTrue) – Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
ddof (int, default1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
where N represents the number of elements.
numeric_only (bool, defaultFalse) – Include only float, int, boolean columns. Not implemented for Series.
Return a GeoSeries consisting of objects representing
the computed Voronoi diagram around the vertices of an input geometry.
All geometries within the GeoSeries are considered together within a single
Voronoi diagram. The resulting geometries therefore do not necessarily map 1:1
to input geometries. Note that each vertex of a geometry is considered a site
for the Voronoi diagram, so the diagram will be constructed around the vertices
of each geometry.
Notes
The order of polygons in the output currently does not correspond to the order
of input vertices.
If you want to generate a Voronoi diagram for each geometry separately, use
shapely.voronoi_polygons() instead.
Parameters:
tolerance (float, default0.0) – Snap input vertices together if their distance is less than this value.
extend_to (shapely.Geometry, defaultNone) – If set, the Voronoi diagram will be extended to cover the
envelope of this geometry (unless this envelope is smaller than the input
geometry).
only_edges (bool(optional, defaultFalse)) – If set to True, the diagram will return LineStrings instead
of Polygons.
Examples
The most common use case is to generate polygons representing the Voronoi
diagram around a set of points:
>>> fromshapelyimportLineString,MultiPoint,Point,Polygon>>> s=geopandas.GeoSeries(... [... Point(1,1),... Point(2,2),... Point(1,3),... Point(0,2),... ]... )>>> s0 POINT (1 1)1 POINT (2 2)2 POINT (1 3)3 POINT (0 2)dtype: geometry
The method supports any geometry type but keep in mind that the underlying
algorithm is based on the vertices of the input geometries only and does not
consider edge segments between vertices.
cond (boolSeries/DataFrame, array-like, or callable) – Where cond is True, keep the original value. Where
False, replace with corresponding value from other.
If cond is callable, it is computed on the Series/DataFrame and
should return boolean Series/DataFrame or array. The callable must
not change input Series/DataFrame (though pandas doesn’t check it).
other (scalar, Series/DataFrame, or callable) – Entries where cond is False are replaced with
corresponding value from other.
If other is callable, it is computed on the Series/DataFrame and
should return scalar or Series/DataFrame. The callable must not
change input Series/DataFrame (though pandas doesn’t check it).
If not specified, entries will be filled with the corresponding
NULL value (np.nan for numpy dtypes, pd.NA for extension
dtypes).
inplace (bool, defaultFalse) – Whether to perform the operation in place on the data.
axis (int, defaultNone) – Alignment axis if needed. For Series this parameter is
unused and defaults to 0.
level (int, defaultNone) – Alignment level if needed.
Return type:
Sametypeascaller or Noneif``inplace=True`.`
See also
DataFrame.mask()
Return an object of same shape as self.
Notes
The where method is an application of the if-then idiom. For each
element in the calling DataFrame, if cond is True the
element is used; otherwise the corresponding element from the DataFrame
other is used. If the axis of other does not align with axis of
cond Series/DataFrame, the misaligned index positions will be filled with
False.
The signature for DataFrame.where() differs from
numpy.where(). Roughly df1.where(m,df2) is equivalent to
np.where(m,df1,df2).
For further details and examples see the where documentation in
indexing.
The dtype of the object takes precedence. The fill value is casted to
the object’s dtype, if this can be done losslessly.
Return a Series of dtype('bool') with value True for
each aligned geometry that is within other.
An object is said to be within other if at least one of its points is located
in the interior and no points are located in the exterior of the other.
If either object is empty, this operation returns False.
This is the inverse of contains() in the sense that the
expression a.within(b)==b.contains(a) always evaluates to
True.
The operation works on a 1-to-1 row-wise manner:
Parameters:
other (GeoSeries or geometricobject) – The GeoSeries (elementwise) or geometric object to test if each
geometry is within.
align (bool|None(defaultNone)) – If True, automatically aligns GeoSeries based on their indices.
If False, the order of elements is preserved. None defaults to True.
We can also check two GeoSeries against each other, row by row.
The GeoSeries above have different indices. We can either align both GeoSeries
based on index values and compare elements with the same index using
align=True or ignore index and compare elements based on their matching
order using align=False:
This method takes a key argument to select data at a particular
level of a MultiIndex.
Parameters:
key (label or tuple of label) – Label contained in the index, or partially in a MultiIndex.
axis ({0or'index',1or'columns'}, default0) – Axis to retrieve cross-section on.
level (object, defaults to firstnlevels(n=1 or len(key))) – In case of a key partially contained in a MultiIndex, indicate
which levels are used. Levels can be referred by label or position.
drop_level (bool, defaultTrue) – If False, returns object with same levels as self.
Returns:
Cross-section from the original Series or DataFrame
corresponding to the selected index levels.
Return type:
Series or DataFrame
See also
DataFrame.loc
Access a group of rows and columns by label(s) or a boolean array.
DataFrame.iloc
Purely integer-location based indexing for selection by position.
Notes
xs can not be used to set values.
MultiIndex Slicers is a generic way to get/set values on
any level or levels.
It is a superset of xs functionality, see
MultiIndex Slicers.
A geometry type representing an area that is enclosed by a linear ring.
A polygon is a two-dimensional feature and has a non-zero area. It may
have one or more negative-space “holes” which are also bounded by linear
rings. If any rings cross each other, the feature is invalid and
operations on it may fail.
Parameters:
shell (sequence) – A sequence of (x, y [,z]) numeric coordinate pairs or triples, or
an array-like with shape (N, 2) or (N, 3).
Also can be a sequence of Point objects.
holes (sequence) – A sequence of objects which satisfy the same requirements as the
shell parameters above
Get a geometry that represents all points within a distance of this geometry.
A positive distance produces a dilation, a negative distance an
erosion. A very small or zero distance may sometimes be used to
“tidy” a polygon.
Parameters:
distance (float) – The distance to buffer around the object.
quad_segs (int, optional) – Sets the number of line segments used to approximate an
angle fillet.
cap_style (shapely.BufferCapStyle or {'round','square','flat'}, default'round') – Specifies the shape of buffered line endings. BufferCapStyle.round
(‘round’) results in circular line endings (see quad_segs). Both
BufferCapStyle.square (‘square’) and BufferCapStyle.flat (‘flat’)
result in rectangular line endings, only BufferCapStyle.flat
(‘flat’) will end at the original vertex, while
BufferCapStyle.square (‘square’) involves adding the buffer width.
join_style (shapely.BufferJoinStyle or {'round','mitre','bevel'}, default'round') – Specifies the shape of buffered line midpoints.
BufferJoinStyle.ROUND (‘round’) results in rounded shapes.
BufferJoinStyle.bevel (‘bevel’) results in a beveled edge that
touches the original vertex. BufferJoinStyle.mitre (‘mitre’) results
in a single vertex that is beveled depending on the mitre_limit
parameter.
mitre_limit (float, optional) – The mitre limit ratio is used for very sharp corners. The
mitre ratio is the ratio of the distance from the corner to
the end of the mitred offset corner. When two line segments
meet at a sharp angle, a miter join will extend the original
geometry. To prevent unreasonable geometry, the mitre limit
allows controlling the maximum length of the join corner.
Corners with a ratio which exceed the limit will be beveled.
The side used is determined by the sign of the buffer
distance:
a positive distance indicates the left-hand side
a negative distance indicates the right-hand side
The single-sided buffer of point geometries is the same as
the regular buffer. The End Cap Style for single-sided
buffers is always ignored, and forced to the equivalent of
CAP_FLAT.
quadsegs (int, optional) – Deprecated aliases for quad_segs.
resolution (int, optional) – Deprecated aliases for quad_segs.
**kwargs (dict, optional) – For backwards compatibility of renamed parameters. If an unsupported
kwarg is passed, a ValueError will be raised.
Return type:
Geometry
Notes
The return value is a strictly two-dimensional geometry. All
Z coordinates of the original geometry will be ignored.
Deprecated since version 2.1.0: A deprecation warning is shown if quad_segs, cap_style,
join_style, mitre_limit or single_sided are
specified as positional arguments. In a future release, these will
need to be specified as keyword arguments.
Return a point at the specified distance along a linear geometry.
Negative length values are taken as measured in the reverse
direction from the end of the geometry. Out-of-range index
values are handled by clamping them to the valid range of values.
If the normalized arg is True, the distance will be interpreted as a
fraction of the geometry’s length.
Return a point at the specified distance along a linear geometry.
Negative length values are taken as measured in the reverse
direction from the end of the geometry. Out-of-range index
values are handled by clamping them to the valid range of values.
If the normalized arg is True, the distance will be interpreted as a
fraction of the geometry’s length.
Return the oriented envelope (minimum rotated rectangle) of the geometry.
The oriented envelope encloses an input geometry, such that the resulting
rectangle has minimum area.
Unlike envelope this rectangle is not constrained to be parallel to the
coordinate axes. If the convex hull of the object is a degenerate (line
or point) this degenerate is returned.
The starting point of the rectangle is not fixed. You can use
normalize() to reorganize the rectangle to
strict canonical form so the starting point is
always the lower left point.
Convert geometry to normal form (or canonical form).
This method orders the coordinates, rings of a polygon and parts of
multi geometries consistently. Typically useful for testing purposes
(for example in combination with equals_exact).
Return the oriented envelope (minimum rotated rectangle) of a geometry.
The oriented envelope encloses an input geometry, such that the resulting
rectangle has minimum area.
Unlike envelope this rectangle is not constrained to be parallel to the
coordinate axes. If the convex hull of the object is a degenerate (line
or point) this degenerate is returned.
The starting point of the rectangle is not fixed. You can use
normalize() to reorganize the rectangle to
strict canonical form so the starting point is
always the lower left point.
Add vertices to line segments based on maximum segment length.
Additional vertices will be added to every line segment in an input geometry
so that segments are no longer than the provided maximum segment length. New
vertices will evenly subdivide each segment.
Only linear components of input geometries are densified; other geometries
are returned unmodified.
Parameters:
max_segment_length (float or array_like) – Additional vertices will be added so that all line segments are no
longer this value. Must be greater than 0.
Return a simplified geometry produced by the Douglas-Peucker algorithm.
Coordinates of the simplified geometry will be no more than the
tolerance distance from the original. Unless the topology preserving
option is used, the algorithm may produce self-intersecting or
otherwise invalid geometries.
transform (Affinetransformationobject) – Transformation from pixel coordinates of source to the
coordinate system of the input shapes. See the transform
property of dataset objects.
all_touched (boolean, optional) – If True, all pixels touched by geometries will be burned in. If
False, only pixels whose center is within the polygon or that
are selected by Bresenham’s line algorithm will be burned in.
False by default
invert (boolean, optional) – If True, mask will be True for pixels that overlap shapes.
False by default.
Return an image array with input geometries burned in.
Warnings will be raised for any invalid or empty geometries, and
an exception will be raised if there are no valid shapes
to rasterize.
Parameters:
shapes (iterable of (`geometry`, value) pairs or geometries) – The geometry can either be an object that implements the geo
interface or GeoJSON-like object. If no value is provided
the default_value will be used. If value is None the
fill value will be used.
fill (int or float, optional) – Used as fill value for all areas not covered by input
geometries.
nodata (float, optional) – nodata value to use in output file or masked array.
masked (bool, optional.Default: False.) – If True, return a masked array. Note: nodata is always set in
the case of file output.
out (numpy.ndarray, optional) – Array in which to store results. If not provided, out_shape
and dtype are required.
transform (Affinetransformationobject, optional) – Transformation from pixel coordinates of source to the
coordinate system of the input shapes. See the transform
property of dataset objects.
all_touched (boolean, optional) – If True, all pixels touched by geometries will be burned in. If
false, only pixels whose center is within the polygon or that
are selected by Bresenham’s line algorithm will be burned in.
merge_alg (MergeAlg, optional) –
Merge algorithm to use. One of:
MergeAlg.replace (default):
the new value will overwrite the existing value.
MergeAlg.add:
the new value will be added to the existing raster.
default_value (int or float, optional) – Used as value for all geometries, if not provided in shapes.
dtype (rasterio or numpy.dtype, optional) – Used as data type for results, if out is not provided.
skip_invalid (bool, optional) – If True (default), invalid shapes will be skipped. If False,
ValueError will be raised.
dst_path (str or PathLike, optional) – Path of output dataset
dst_kwds (dict, optional) – Dictionary of creation options and other parameters that will be
overlaid on the profile of the output dataset.
Returns:
If out was not None then out is returned, it will have been
modified in-place. If out was None, this will be a new array.
Valid data types for fill, default_value, out, dtype and
shape values are “int16”, “int32”, “uint8”, “uint16”, “uint32”,
“float32”, and “float64”.
This function requires significant memory resources. The shapes
iterator will be materialized to a Python list and another C copy of
that list will be made. The out array will be copied and
additional temporary raster memory equal to 2x the smaller of out
data or GDAL’s max cache size (controlled by GDAL_CACHEMAX, default
is 5% of the computer’s physical memory) is required.
If GDAL max cache size is smaller than the output data, the array of
shapes will be iterated multiple times. Performance is thus a linear
function of buffer size. For maximum speed, ensure that
GDAL_CACHEMAX is larger than the size of out or out_shape.
The dataset may be located in a local file, in a resource located by
a URL, or contained within a stream of bytes. This function accepts
different types of fp parameters. However, it is almost always best
to pass a string that has a dataset name as its value. These are
passed directly to GDAL protocol and format handlers. A path to
a zipfile is more efficiently used by GDAL than a Python ZipFile
object, for example.
In read (‘r’) or read/write (‘r+’) mode, no keyword arguments are
required: these attributes are supplied by the opened dataset.
In write (‘w’ or ‘w+’) mode, the driver, width, height, count, and
dtype keywords are strictly required.
Parameters:
fp (str, os.PathLike, file-like, or rasterio.io.MemoryFile) – A filename or URL, a file object opened in binary (‘rb’) mode,
a Path object, or one of the rasterio classes that provides the
dataset-opening interface (has an open method that returns
a dataset). Use a string when possible: GDAL can more
efficiently access a dataset if it opens it natively.
mode (str, optional) – ‘r’ (read, the default), ‘r+’ (read/write), ‘w’ (write), or
‘w+’ (write/read).
driver (str, optional) – A short format driver name (e.g. “GTiff” or “JPEG”) or a list of
such names (see GDAL docs at
https://gdal.org/drivers/raster/index.html). In ‘w’ or ‘w+’ modes
a single name is required. In ‘r’ or ‘r+’ modes the driver can
usually be omitted. Registered drivers will be tried
sequentially until a match is found. When multiple drivers are
available for a format such as JPEG2000, one of them can be
selected by using this keyword argument.
width (int, optional) – The number of columns of the raster dataset. Required in ‘w’ or
‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.
height (int, optional) – The number of rows of the raster dataset. Required in ‘w’ or
‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.
count (int, optional) – The count of dataset bands. Required in ‘w’ or ‘w+’ modes, it is
ignored in ‘r’ or ‘r+’ modes.
crs (str, dict, or CRS, optional) – The coordinate reference system. Required in ‘w’ or ‘w+’ modes,
it is ignored in ‘r’ or ‘r+’ modes.
transform (affine.Affine, optional) – Affine transformation mapping the pixel space to geographic
space. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or
‘r+’ modes.
dtype (str or numpy.dtype, optional) – The data type for bands. For example: ‘uint8’ or
rasterio.uint16. Required in ‘w’ or ‘w+’ modes, it is
ignored in ‘r’ or ‘r+’ modes.
nodata (int, float, or nan, optional) – Defines the pixel value to be interpreted as not valid data.
Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’
modes.
sharing (bool, optional) – To reduce overhead and prevent programs from running out of file
descriptors, rasterio maintains a pool of shared low level
dataset handles. If True this function will use a shared
handle if one is available. Multithreaded programs must avoid
sharing and should set sharing to False.
opener (callable, optional) – A custom dataset opener which can serve GDAL’s virtual
filesystem machinery via Python file-like objects. The
underlying file-like object is obtained by calling opener with
(fp, mode) or (fp, mode + “b”) depending on the format
driver’s native mode. opener must return a Python file-like
object that provides read, seek, tell, and close methods. Note:
only one opener at a time per fp, mode pair is allowed.
kwargs (optional) – These are passed to format drivers as directives for creating or
interpreting datasets. For example: in ‘w’ or ‘w+’ modes
a tiled=True keyword argument will direct the GeoTIFF format
driver to create a tiled, rather than striped, TIFF.
Returns:
rasterio.io.DatasetReader – If mode is “r”.
rasterio.io.DatasetWriter – If mode is “r+”, “w”, or “w+”.
Raises:
TypeError – If arguments are of the wrong Python type.
rasterio.errors.RasterioIOError – If the dataset can not be opened. Such as when there is no
dataset with the given name.
rasterio.errors.DriverCapabilityError – If the detected format driver does not support the requested
opening mode.
Examples
To open a local GeoTIFF dataset for reading using standard driver
discovery and no directives:
Raster data processing functionality for geospatial analysis.
This module provides:
1. Classes for handling and manipulating raster datasets
2. Rasterization tools for converting vector data to rasters
3. Cost surface generation capabilities
4. Utility functions for creating test data and processing rasters
Class for efficiently working with raster data while preserving
geographic transformation information. Can be initialized with either a file path
or directly with raster data, CRS, and transform.
Initialize a RasterHandler for working with raster data and coordinate
transformations.
Creates a window and buffer geometry based on source and target coordinates:
- If source and target are single coordinates: creates a line buffer
- If source and/or target are lists of coordinates: creates a polygon buffer
Parameters:
raster_source (RasterDataset) – Either:
- Path to the raster file (str), or
- Tuple of (data_array, crs, transform)