pyorps.raster package

Submodules

pyorps.raster.handler module

PYORPS: An Open-Source Tool for Automated Power Line Routing

Reference: [1] Hofmann, M., Stetz, T., Kammer, F., Repo, S.: ‘PYORPS: An Open-Source Tool for

Automated Power Line Routing’, CIRED 2025 - 28th Conference and Exhibition on Electricity Distribution, 16 - 19 June 2025, Geneva, Switzerland

class pyorps.raster.handler.Affine(a, b, c, d, e, f, g=0.0, h=0.0, i=1.0)[source]

Bases: Affine

Two dimensional affine transform for 2D linear mapping.

Parameters:
  • a (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • b (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • c (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • d (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • e (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • f (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • g (float)

  • h (float)

  • i (float)

a, b, c, d, e, f, g, h, i

The coefficients of the 3x3 augmented affine transformation matrix

x’ | | a b c | | x |
y’ | = | d e f | | y |
1 | | g h i | | 1 |

g, h, and i are always 0, 0, and 1.

Type:

float

The Affine package is derived from Casey Duncan's Planar package.
See the copyright statement below.  Parallel lines are preserved by
these transforms. Affine transforms can perform any combination of
translations, scales/flips, shears, and rotations.  Class methods
are provided to conveniently compose transforms from these
operations.
Internally the transform is stored as a 3x3 transformation matrix.
The transform may be constructed directly by specifying the first
two rows of matrix values as 6 floats. Since the matrix is an affine
transform, the last row is always ``(0, 0, 1)``.
N.B.
Type:

multiplication of a transform and an (x, y) vector *always*

returns the column vector that is the matrix multiplication product
of the transform and (x, y) as a column vector, no matter which is
on the left or right side. This is obviously not the case for
matrices and vectors in general, but provides a convenience for
users of this class.
__getnewargs__()[source]

Pickle protocol support

Notes

Normal unpickling creates a situation where __new__ receives all 9 elements rather than the 6 that are required for the constructor. This method ensures that only the 6 are provided.

__invert__()[source]

Return the inverse transform.

Raises:

:except:`TransformNotInvertible` if the transform is degenerate.

__mul__(other)[source]

Multiplication

Apply the transform using matrix multiplication, creating a resulting object of the same type. A transform may be applied to another transform, a vector, vector array, or shape.

Parameters:

other (Affine, Vec2, Vec2Array, Shape) – The object to transform.

Return type:

Same as other

static __new__(cls, a, b, c, d, e, f, g=0.0, h=0.0, i=1.0)[source]

Create a new object

Parameters:
  • a (float) – Elements of an augmented affine transformation matrix.

  • b (float) – Elements of an augmented affine transformation matrix.

  • c (float) – Elements of an augmented affine transformation matrix.

  • d (float) – Elements of an augmented affine transformation matrix.

  • e (float) – Elements of an augmented affine transformation matrix.

  • f (float) – Elements of an augmented affine transformation matrix.

  • g (float)

  • h (float)

  • i (float)

__repr__()[source]

Precise string representation.

Return type:

str

__rmul__(other)[source]

Right hand multiplication

Deprecated since version 2.3.0: Right multiplication will be prohibited in version 3.0. This method will raise AffineError.

Notes

We should not be called if other is an affine instance This is just a guarantee, since we would potentially return the wrong answer in that case.

__str__()[source]

Concise string representation.

Return type:

str

a

Alias for field number 0

almost_equals(other, precision=1e-05)[source]

Compare transforms for approximate equality.

Parameters:
  • other (Affine) – Transform being compared.

  • precision (float)

Return type:

bool

Returns:

True if absolute difference between each element of each respective transform matrix < self.precision.

b

Alias for field number 1

c

Alias for field number 2

property column_vectors

The values of the transform as three 2D column vectors

count(value, /)

Return number of occurrences of value.

d

Alias for field number 3

property determinant

The determinant of the transform matrix.

This value is equal to the area scaling factor when the transform is applied to a shape.

e

Alias for field number 4

property eccentricity: float

The eccentricity of the affine transformation.

This value represents the eccentricity of an ellipse under this affine transformation.

Raises NotImplementedError for improper transformations.

f

Alias for field number 5

classmethod from_gdal(c, a, b, f, d, e)[source]

Use same coefficient order as GDAL’s GetGeoTransform().

Parameters:
Return type:

Affine

g

Alias for field number 6

h

Alias for field number 7

i

Alias for field number 8

classmethod identity()[source]

Return the identity transform.

Return type:

Affine

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

property is_conformal: bool

True if the transform is conformal.

i.e., if angles between points are preserved after applying the transform, within rounding limits. This implies that the transform has no effective shear.

property is_degenerate

True if this transform is degenerate.

Which means that it will collapse a shape to an effective area of zero. Degenerate transforms cannot be inverted.

property is_identity: bool

True if this transform equals the identity matrix, within rounding limits.

property is_orthonormal: bool

True if the transform is orthonormal.

Which means that the transform represents a rigid motion, which has no effective scaling or shear. Mathematically, this means that the axis vectors of the transform matrix are perpendicular and unit-length. Applying an orthonormal transform to a shape always results in a congruent shape.

property is_proper

True if this transform is proper.

Which means that it does not include reflection.

property is_rectilinear: bool

True if the transform is rectilinear.

i.e., whether a shape would remain axis-aligned, within rounding limits, after applying the transform.

itransform(seq)[source]

Transform a sequence of points or vectors in place.

Parameters:

seq – Mutable sequence of Vec2 to be transformed.

Return type:

None

Returns:

None, the input sequence is mutated in place.

classmethod permutation(*scaling)[source]

Create the permutation transform

For 2x2 matrices, there is only one permutation matrix that is not the identity.

Return type:

Affine

precision = 1e-05
classmethod rotation(angle, pivot=None)[source]

Create a rotation transform at the specified angle.

A pivot point other than the coordinate system origin may be optionally specified.

Parameters:
  • angle (float) – Rotation angle in degrees, counter-clockwise about the pivot point.

  • pivot (sequence) – Point to rotate about, if omitted the rotation is about the origin.

Return type:

Affine

property rotation_angle: float

The rotation angle in degrees of the affine transformation.

This is the rotation angle in degrees of the affine transformation, assuming it is in the form M = R S, where R is a rotation and S is a scaling.

Raises UndefinedRotationError for improper and degenerate transformations.

classmethod scale(*scaling)[source]

Create a scaling transform from a scalar or vector.

Parameters:

scaling (float or sequence) – The scaling factor. A scalar value will scale in both dimensions equally. A vector scaling value scales the dimensions independently.

Return type:

Affine

classmethod shear(x_angle=0, y_angle=0)[source]

Create a shear transform along one or both axes.

Parameters:
  • x_angle (float) – Shear angle in degrees parallel to the x-axis.

  • y_angle (float) – Shear angle in degrees parallel to the y-axis.

Return type:

Affine

to_gdal()[source]

Return same coefficient order as GDAL’s SetGeoTransform().

Return type:

tuple

to_shapely()[source]

Return an affine transformation matrix compatible with shapely

Shapely’s affinity module expects an affine transformation matrix in (a,b,d,e,xoff,yoff) order.

Return type:

tuple

classmethod translation(xoff, yoff)[source]

Create a translation transform from an offset vector.

Parameters:
  • xoff (float) – Translation x offset.

  • yoff (float) – Translation y offset.

Return type:

Affine

property xoff: float

Alias for ‘c’

property yoff: float

Alias for ‘f’

class pyorps.raster.handler.Any(*args, **kwargs)[source]

Bases: object

Special type indicating an unconstrained type.

  • Any is compatible with every type.

  • Any assumed to have all methods.

  • All values assumed to be instances of Any.

Note that all the above statements are true from the point of view of static type checkers. At runtime, Any should not be used with instance checks.

class pyorps.raster.handler.LineString(coordinates=None)[source]

Bases: BaseGeometry

A geometry type composed of one or more line segments.

A LineString is a one-dimensional feature and has a non-zero length but zero area. It may approximate a curve and need not be straight. A LineString may be closed.

Parameters:

coordinates (sequence) – A sequence of (x, y, [,z]) numeric coordinate pairs or triples, or an array-like with shape (N, 2) or (N, 3). Also can be a sequence of Point objects, or combination of both.

Examples

Create a LineString with two segments

>>> from shapely import LineString
>>> a = LineString([[0, 0], [1, 0], [1, 1]])
>>> a.length
2.0
__and__(other)

Return the intersection of the geometries.

__bool__()

Return True if the geometry is not empty, else False.

__format__(format_spec)

Format a geometry using a format specification.

static __new__(self, coordinates=None)[source]

Create a new LineString geometry.

__nonzero__()

Return True if the geometry is not empty, else False.

__or__(other)

Return the union of the geometries.

__reduce__()

Pickle support.

__repr__()

Return a string representation of the geometry.

__str__()

Return a string representation of the geometry.

__sub__(other)

Return the difference of the geometries.

__xor__(other)

Return the symmetric difference of the geometries.

property area

Unitless area of the geometry (float).

property boundary

Return a lower dimension geometry that bounds the object.

The boundary of a polygon is a line, the boundary of a line is a collection of points. The boundary of a point is an empty (null) collection.

property bounds

Return minimum bounding region (minx, miny, maxx, maxy).

buffer(distance, quad_segs=16, cap_style='round', join_style='round', mitre_limit=5.0, single_sided=False, **kwargs)

Get a geometry that represents all points within a distance of this geometry.

A positive distance produces a dilation, a negative distance an erosion. A very small or zero distance may sometimes be used to “tidy” a polygon.

Parameters:
  • distance (float) – The distance to buffer around the object.

  • quad_segs (int, optional) – Sets the number of line segments used to approximate an angle fillet.

  • cap_style (shapely.BufferCapStyle or {'round', 'square', 'flat'}, default 'round') – Specifies the shape of buffered line endings. BufferCapStyle.round (‘round’) results in circular line endings (see quad_segs). Both BufferCapStyle.square (‘square’) and BufferCapStyle.flat (‘flat’) result in rectangular line endings, only BufferCapStyle.flat (‘flat’) will end at the original vertex, while BufferCapStyle.square (‘square’) involves adding the buffer width.

  • join_style (shapely.BufferJoinStyle or {'round', 'mitre', 'bevel'}, default 'round') – Specifies the shape of buffered line midpoints. BufferJoinStyle.ROUND (‘round’) results in rounded shapes. BufferJoinStyle.bevel (‘bevel’) results in a beveled edge that touches the original vertex. BufferJoinStyle.mitre (‘mitre’) results in a single vertex that is beveled depending on the mitre_limit parameter.

  • mitre_limit (float, optional) – The mitre limit ratio is used for very sharp corners. The mitre ratio is the ratio of the distance from the corner to the end of the mitred offset corner. When two line segments meet at a sharp angle, a miter join will extend the original geometry. To prevent unreasonable geometry, the mitre limit allows controlling the maximum length of the join corner. Corners with a ratio which exceed the limit will be beveled.

  • single_sided (bool, optional) –

    The side used is determined by the sign of the buffer distance:

    a positive distance indicates the left-hand side a negative distance indicates the right-hand side

    The single-sided buffer of point geometries is the same as the regular buffer. The End Cap Style for single-sided buffers is always ignored, and forced to the equivalent of CAP_FLAT.

  • quadsegs (int, optional) – Deprecated aliases for quad_segs.

  • resolution (int, optional) – Deprecated aliases for quad_segs.

  • **kwargs (dict, optional) – For backwards compatibility of renamed parameters. If an unsupported kwarg is passed, a ValueError will be raised.

Return type:

Geometry

Notes

The return value is a strictly two-dimensional geometry. All Z coordinates of the original geometry will be ignored.

Deprecated since version 2.1.0: A deprecation warning is shown if quad_segs, cap_style, join_style, mitre_limit or single_sided are specified as positional arguments. In a future release, these will need to be specified as keyword arguments.

Examples

>>> from shapely import BufferCapStyle
>>> from shapely.wkt import loads
>>> g = loads('POINT (0.0 0.0)')

16-gon approx of a unit radius circle:

>>> g.buffer(1.0).area
3.1365484905459398

128-gon approximation:

>>> g.buffer(1.0, 128).area
3.1415138011443013

triangle approximation:

>>> g.buffer(1.0, 3).area
3.0
>>> list(g.buffer(1.0, cap_style=BufferCapStyle.square).exterior.coords)
[(1.0, 1.0), (1.0, -1.0), (-1.0, -1.0), (-1.0, 1.0), (1.0, 1.0)]
>>> g.buffer(1.0, cap_style=BufferCapStyle.square).area
4.0
property centroid

Return the geometric center of the object.

contains(other)

Return True if the geometry contains the other, else False.

contains_properly(other)

Return True if the geometry completely contains the other.

There should be no common boundary points.

Refer to shapely.contains_properly for full documentation.

property convex_hull

Return the convex hull of the geometry.

Imagine an elastic band stretched around the geometry: that’s a convex hull, more or less.

The convex hull of a three member multipoint, for example, is a triangular polygon.

property coords

Access to geometry’s coordinates (CoordinateSequence).

covered_by(other)

Return True if the geometry is covered by the other, else False.

covers(other)

Return True if the geometry covers the other, else False.

crosses(other)

Return True if the geometries cross, else False.

difference(other, grid_size=None)

Return the difference of the geometries.

Refer to shapely.difference for full documentation.

disjoint(other)

Return True if geometries are disjoint, else False.

distance(other)

Unitless distance to other geometry (float).

dwithin(other, distance)

Return True if geometry is within a given distance from the other.

Refer to shapely.dwithin for full documentation.

property envelope

A figure that envelopes the geometry.

equals(other)

Return True if geometries are equal, else False.

This method considers point-set equality (or topological equality), and is equivalent to (self.within(other) & self.contains(other)).

Examples

>>> from shapely import LineString
>>> LineString(
...     [(0, 0), (2, 2)]
... ).equals(
...     LineString([(0, 0), (1, 1), (2, 2)])
... )
True
Return type:

bool

equals_exact(other, tolerance=0.0, *, normalize=False)

Return True if the geometries are equivalent within the tolerance.

Refer to equals_exact() for full documentation.

Parameters:
  • other (BaseGeometry) – The other geometry object in this comparison.

  • tolerance (float, optional (default: 0.)) – Absolute tolerance in the same units as coordinates.

  • normalize (bool, optional (default: False)) –

    If True, normalize the two geometries so that the coordinates are in the same order.

    Added in version 2.1.0.

Examples

>>> from shapely import LineString
>>> LineString(
...     [(0, 0), (2, 2)]
... ).equals_exact(
...     LineString([(0, 0), (1, 1), (2, 2)]),
...     1e-6
... )
False
Return type:

bool

property geom_type

Name of the geometry’s type, such as ‘Point’.

geometryType()

Get the geometry type (deprecated).

Deprecated since version 2.0: Use the geom_type attribute instead.

property has_m

True if the geometry’s coordinate sequence(s) have m values.

property has_z

True if the geometry’s coordinate sequence(s) have z values.

hausdorff_distance(other)

Unitless hausdorff distance to other geometry (float).

interpolate(distance, normalized=False)

Return a point at the specified distance along a linear geometry.

Negative length values are taken as measured in the reverse direction from the end of the geometry. Out-of-range index values are handled by clamping them to the valid range of values. If the normalized arg is True, the distance will be interpreted as a fraction of the geometry’s length.

Alias of line_interpolate_point.

intersection(other, grid_size=None)

Return the intersection of the geometries.

Refer to shapely.intersection for full documentation.

intersects(other)

Return True if geometries intersect, else False.

property is_closed

True if the geometry is closed, else False.

Applicable only to linear geometries.

property is_empty

True if the set of points in this geometry is empty, else False.

property is_ring

True if the geometry is a closed ring, else False.

property is_simple

True if the geometry is simple.

Simple means that any self-intersections are only at boundary points.

property is_valid

True if the geometry is valid.

The definition depends on sub-class.

property length

Unitless length of the geometry (float).

line_interpolate_point(distance, normalized=False)

Return a point at the specified distance along a linear geometry.

Negative length values are taken as measured in the reverse direction from the end of the geometry. Out-of-range index values are handled by clamping them to the valid range of values. If the normalized arg is True, the distance will be interpreted as a fraction of the geometry’s length.

Alias of interpolate.

line_locate_point(other, normalized=False)

Return the distance of this geometry to a point nearest the specified point.

If the normalized arg is True, return the distance normalized to the length of the linear geometry.

Alias of project.

property minimum_clearance

Unitless distance a node can be moved to produce an invalid geometry (float).

property minimum_rotated_rectangle

Return the oriented envelope (minimum rotated rectangle) of the geometry.

The oriented envelope encloses an input geometry, such that the resulting rectangle has minimum area.

Unlike envelope this rectangle is not constrained to be parallel to the coordinate axes. If the convex hull of the object is a degenerate (line or point) this degenerate is returned.

The starting point of the rectangle is not fixed. You can use normalize() to reorganize the rectangle to strict canonical form so the starting point is always the lower left point.

Alias of oriented_envelope.

normalize()

Convert geometry to normal form (or canonical form).

This method orders the coordinates, rings of a polygon and parts of multi geometries consistently. Typically useful for testing purposes (for example in combination with equals_exact).

Examples

>>> from shapely import MultiLineString
>>> line = MultiLineString([[(0, 0), (1, 1)], [(3, 3), (2, 2)]])
>>> line.normalize()
<MULTILINESTRING ((2 2, 3 3), (0 0, 1 1))>
offset_curve(distance, quad_segs=16, join_style=BufferJoinStyle.round, mitre_limit=5.0)[source]

Return a (Multi)LineString at a distance from the object.

The side, left or right, is determined by the sign of the distance parameter (negative for right side offset, positive for left side offset). The resolution of the buffer around each vertex of the object increases by increasing the quad_segs keyword parameter.

The join style is for outside corners between line segments. Accepted values are JOIN_STYLE.round (1), JOIN_STYLE.mitre (2), and JOIN_STYLE.bevel (3).

The mitre ratio limit is used for very sharp corners. It is the ratio of the distance from the corner to the end of the mitred offset corner. When two line segments meet at a sharp angle, a miter join will extend far beyond the original geometry. To prevent unreasonable geometry, the mitre limit allows controlling the maximum length of the join corner. Corners with a ratio which exceed the limit will be beveled.

Note: the behaviour regarding orientation of the resulting line depends on the GEOS version. With GEOS < 3.11, the line retains the same direction for a left offset (positive distance) or has reverse direction for a right offset (negative distance), and this behaviour was documented as such in previous Shapely versions. Starting with GEOS 3.11, the function tries to preserve the orientation of the original line.

property oriented_envelope

Return the oriented envelope (minimum rotated rectangle) of a geometry.

The oriented envelope encloses an input geometry, such that the resulting rectangle has minimum area.

Unlike envelope this rectangle is not constrained to be parallel to the coordinate axes. If the convex hull of the object is a degenerate (line or point) this degenerate is returned.

The starting point of the rectangle is not fixed. You can use normalize() to reorganize the rectangle to strict canonical form so the starting point is always the lower left point.

Alias of minimum_rotated_rectangle.

overlaps(other)

Return True if geometries overlap, else False.

parallel_offset(distance, side='right', resolution=16, join_style=BufferJoinStyle.round, mitre_limit=5.0)[source]

Alternative method to offset_curve() method.

Older alternative method to the offset_curve() method, but uses resolution instead of quad_segs and a side keyword (‘left’ or ‘right’) instead of sign of the distance. This method is kept for backwards compatibility for now, but is is recommended to use offset_curve() instead.

point_on_surface()

Return a point guaranteed to be within the object, cheaply.

Alias of representative_point.

project(other, normalized=False)

Return the distance of geometry to a point nearest the specified point.

If the normalized arg is True, return the distance normalized to the length of the linear geometry.

Alias of line_locate_point.

relate(other)

Return the DE-9IM intersection matrix for the two geometries (string).

relate_pattern(other, pattern)

Return True if the DE-9IM relationship code satisfies the pattern.

representative_point()

Return a point guaranteed to be within the object, cheaply.

Alias of point_on_surface.

reverse()

Return a copy of this geometry with the order of coordinates reversed.

If the geometry is a polygon with interior rings, the interior rings are also reversed.

Points are unchanged.

See also

is_ccw

Checks if a geometry is clockwise.

Examples

>>> from shapely import LineString, Polygon
>>> LineString([(0, 0), (1, 2)]).reverse()
<LINESTRING (1 2, 0 0)>
>>> Polygon([(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)]).reverse()
<POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))>
segmentize(max_segment_length)

Add vertices to line segments based on maximum segment length.

Additional vertices will be added to every line segment in an input geometry so that segments are no longer than the provided maximum segment length. New vertices will evenly subdivide each segment.

Only linear components of input geometries are densified; other geometries are returned unmodified.

Parameters:

max_segment_length (float or array_like) – Additional vertices will be added so that all line segments are no longer this value. Must be greater than 0.

Examples

>>> from shapely import LineString, Polygon
>>> LineString([(0, 0), (0, 10)]).segmentize(max_segment_length=5)
<LINESTRING (0 0, 0 5, 0 10)>
>>> Polygon([(0, 0), (10, 0), (10, 10), (0, 10), (0, 0)]).segmentize(max_segment_length=5)
<POLYGON ((0 0, 5 0, 10 0, 10 5, 10 10, 5 10, 0 10, 0 5, 0 0))>
simplify(tolerance, preserve_topology=True)

Return a simplified geometry produced by the Douglas-Peucker algorithm.

Coordinates of the simplified geometry will be no more than the tolerance distance from the original. Unless the topology preserving option is used, the algorithm may produce self-intersecting or otherwise invalid geometries.

svg(scale_factor=1.0, stroke_color=None, opacity=None)[source]

Return SVG polyline element for the LineString geometry.

Parameters:
  • scale_factor (float) – Multiplication factor for the SVG stroke-width. Default is 1.

  • stroke_color (str, optional) – Hex string for stroke color. Default is to use “#66cc99” if geometry is valid, and “#ff3333” if invalid.

  • opacity (float) – Float number between 0 and 1 for color opacity. Default value is 0.8

symmetric_difference(other, grid_size=None)

Return the symmetric difference of the geometries.

Refer to shapely.symmetric_difference for full documentation.

touches(other)

Return True if geometries touch, else False.

property type

Get the geometry type (deprecated).

Deprecated since version 2.0: Use the geom_type attribute instead.

union(other, grid_size=None)

Return the union of the geometries.

Refer to shapely.union for full documentation.

within(other)

Return True if geometry is within the other, else False.

property wkb

WKB representation of the geometry.

property wkb_hex

WKB hex representation of the geometry.

property wkt

WKT representation of the geometry.

property xy

Separate arrays of X and Y coordinate values.

Examples

>>> from shapely import LineString
>>> x, y = LineString([(0, 0), (1, 1)]).xy
>>> list(x)
[0.0, 1.0]
>>> list(y)
[0.0, 1.0]
class pyorps.raster.handler.MultiPoint(points=None)[source]

Bases: BaseMultipartGeometry

A collection of one or more Points.

A MultiPoint has zero area and zero length.

Parameters:

points (sequence) – A sequence of Points, or a sequence of (x, y [,z]) numeric coordinate pairs or triples, or an array-like of shape (N, 2) or (N, 3).

geoms

A sequence of Points

Type:

sequence

Examples

Construct a MultiPoint containing two Points

>>> from shapely import MultiPoint, Point
>>> ob = MultiPoint([[0.0, 0.0], [1.0, 2.0]])
>>> len(ob.geoms)
2
>>> type(ob.geoms[0]) == Point
True
__and__(other)

Return the intersection of the geometries.

__bool__()

Return True if the geometry is not empty, else False.

__eq__(other)

Return True if geometries are equal, else False.

__format__(format_spec)

Format a geometry using a format specification.

__hash__()

Return the hash value of the geometry.

static __new__(self, points=None)[source]

Create a new MultiPoint geometry.

__nonzero__()

Return True if the geometry is not empty, else False.

__or__(other)

Return the union of the geometries.

__reduce__()

Pickle support.

__repr__()

Return a string representation of the geometry.

__str__()

Return a string representation of the geometry.

__sub__(other)

Return the difference of the geometries.

__xor__(other)

Return the symmetric difference of the geometries.

property area

Unitless area of the geometry (float).

property boundary

Return a lower dimension geometry that bounds the object.

The boundary of a polygon is a line, the boundary of a line is a collection of points. The boundary of a point is an empty (null) collection.

property bounds

Return minimum bounding region (minx, miny, maxx, maxy).

buffer(distance, quad_segs=16, cap_style='round', join_style='round', mitre_limit=5.0, single_sided=False, **kwargs)

Get a geometry that represents all points within a distance of this geometry.

A positive distance produces a dilation, a negative distance an erosion. A very small or zero distance may sometimes be used to “tidy” a polygon.

Parameters:
  • distance (float) – The distance to buffer around the object.

  • quad_segs (int, optional) – Sets the number of line segments used to approximate an angle fillet.

  • cap_style (shapely.BufferCapStyle or {'round', 'square', 'flat'}, default 'round') – Specifies the shape of buffered line endings. BufferCapStyle.round (‘round’) results in circular line endings (see quad_segs). Both BufferCapStyle.square (‘square’) and BufferCapStyle.flat (‘flat’) result in rectangular line endings, only BufferCapStyle.flat (‘flat’) will end at the original vertex, while BufferCapStyle.square (‘square’) involves adding the buffer width.

  • join_style (shapely.BufferJoinStyle or {'round', 'mitre', 'bevel'}, default 'round') – Specifies the shape of buffered line midpoints. BufferJoinStyle.ROUND (‘round’) results in rounded shapes. BufferJoinStyle.bevel (‘bevel’) results in a beveled edge that touches the original vertex. BufferJoinStyle.mitre (‘mitre’) results in a single vertex that is beveled depending on the mitre_limit parameter.

  • mitre_limit (float, optional) – The mitre limit ratio is used for very sharp corners. The mitre ratio is the ratio of the distance from the corner to the end of the mitred offset corner. When two line segments meet at a sharp angle, a miter join will extend the original geometry. To prevent unreasonable geometry, the mitre limit allows controlling the maximum length of the join corner. Corners with a ratio which exceed the limit will be beveled.

  • single_sided (bool, optional) –

    The side used is determined by the sign of the buffer distance:

    a positive distance indicates the left-hand side a negative distance indicates the right-hand side

    The single-sided buffer of point geometries is the same as the regular buffer. The End Cap Style for single-sided buffers is always ignored, and forced to the equivalent of CAP_FLAT.

  • quadsegs (int, optional) – Deprecated aliases for quad_segs.

  • resolution (int, optional) – Deprecated aliases for quad_segs.

  • **kwargs (dict, optional) – For backwards compatibility of renamed parameters. If an unsupported kwarg is passed, a ValueError will be raised.

Return type:

Geometry

Notes

The return value is a strictly two-dimensional geometry. All Z coordinates of the original geometry will be ignored.

Deprecated since version 2.1.0: A deprecation warning is shown if quad_segs, cap_style, join_style, mitre_limit or single_sided are specified as positional arguments. In a future release, these will need to be specified as keyword arguments.

Examples

>>> from shapely import BufferCapStyle
>>> from shapely.wkt import loads
>>> g = loads('POINT (0.0 0.0)')

16-gon approx of a unit radius circle:

>>> g.buffer(1.0).area
3.1365484905459398

128-gon approximation:

>>> g.buffer(1.0, 128).area
3.1415138011443013

triangle approximation:

>>> g.buffer(1.0, 3).area
3.0
>>> list(g.buffer(1.0, cap_style=BufferCapStyle.square).exterior.coords)
[(1.0, 1.0), (1.0, -1.0), (-1.0, -1.0), (-1.0, 1.0), (1.0, 1.0)]
>>> g.buffer(1.0, cap_style=BufferCapStyle.square).area
4.0
property centroid

Return the geometric center of the object.

contains(other)

Return True if the geometry contains the other, else False.

contains_properly(other)

Return True if the geometry completely contains the other.

There should be no common boundary points.

Refer to shapely.contains_properly for full documentation.

property convex_hull

Return the convex hull of the geometry.

Imagine an elastic band stretched around the geometry: that’s a convex hull, more or less.

The convex hull of a three member multipoint, for example, is a triangular polygon.

property coords

Not implemented.

Sub-geometries may have coordinate sequences, but multi-part geometries do not.

covered_by(other)

Return True if the geometry is covered by the other, else False.

covers(other)

Return True if the geometry covers the other, else False.

crosses(other)

Return True if the geometries cross, else False.

difference(other, grid_size=None)

Return the difference of the geometries.

Refer to shapely.difference for full documentation.

disjoint(other)

Return True if geometries are disjoint, else False.

distance(other)

Unitless distance to other geometry (float).

dwithin(other, distance)

Return True if geometry is within a given distance from the other.

Refer to shapely.dwithin for full documentation.

property envelope

A figure that envelopes the geometry.

equals(other)

Return True if geometries are equal, else False.

This method considers point-set equality (or topological equality), and is equivalent to (self.within(other) & self.contains(other)).

Examples

>>> from shapely import LineString
>>> LineString(
...     [(0, 0), (2, 2)]
... ).equals(
...     LineString([(0, 0), (1, 1), (2, 2)])
... )
True
Return type:

bool

equals_exact(other, tolerance=0.0, *, normalize=False)

Return True if the geometries are equivalent within the tolerance.

Refer to equals_exact() for full documentation.

Parameters:
  • other (BaseGeometry) – The other geometry object in this comparison.

  • tolerance (float, optional (default: 0.)) – Absolute tolerance in the same units as coordinates.

  • normalize (bool, optional (default: False)) –

    If True, normalize the two geometries so that the coordinates are in the same order.

    Added in version 2.1.0.

Examples

>>> from shapely import LineString
>>> LineString(
...     [(0, 0), (2, 2)]
... ).equals_exact(
...     LineString([(0, 0), (1, 1), (2, 2)]),
...     1e-6
... )
False
Return type:

bool

property geom_type

Name of the geometry’s type, such as ‘Point’.

geometryType()

Get the geometry type (deprecated).

Deprecated since version 2.0: Use the geom_type attribute instead.

property geoms

Access to the contained geometries.

property has_m

True if the geometry’s coordinate sequence(s) have m values.

property has_z

True if the geometry’s coordinate sequence(s) have z values.

hausdorff_distance(other)

Unitless hausdorff distance to other geometry (float).

interpolate(distance, normalized=False)

Return a point at the specified distance along a linear geometry.

Negative length values are taken as measured in the reverse direction from the end of the geometry. Out-of-range index values are handled by clamping them to the valid range of values. If the normalized arg is True, the distance will be interpreted as a fraction of the geometry’s length.

Alias of line_interpolate_point.

intersection(other, grid_size=None)

Return the intersection of the geometries.

Refer to shapely.intersection for full documentation.

intersects(other)

Return True if geometries intersect, else False.

property is_closed

True if the geometry is closed, else False.

Applicable only to linear geometries.

property is_empty

True if the set of points in this geometry is empty, else False.

property is_ring

True if the geometry is a closed ring, else False.

property is_simple

True if the geometry is simple.

Simple means that any self-intersections are only at boundary points.

property is_valid

True if the geometry is valid.

The definition depends on sub-class.

property length

Unitless length of the geometry (float).

line_interpolate_point(distance, normalized=False)

Return a point at the specified distance along a linear geometry.

Negative length values are taken as measured in the reverse direction from the end of the geometry. Out-of-range index values are handled by clamping them to the valid range of values. If the normalized arg is True, the distance will be interpreted as a fraction of the geometry’s length.

Alias of interpolate.

line_locate_point(other, normalized=False)

Return the distance of this geometry to a point nearest the specified point.

If the normalized arg is True, return the distance normalized to the length of the linear geometry.

Alias of project.

property minimum_clearance

Unitless distance a node can be moved to produce an invalid geometry (float).

property minimum_rotated_rectangle

Return the oriented envelope (minimum rotated rectangle) of the geometry.

The oriented envelope encloses an input geometry, such that the resulting rectangle has minimum area.

Unlike envelope this rectangle is not constrained to be parallel to the coordinate axes. If the convex hull of the object is a degenerate (line or point) this degenerate is returned.

The starting point of the rectangle is not fixed. You can use normalize() to reorganize the rectangle to strict canonical form so the starting point is always the lower left point.

Alias of oriented_envelope.

normalize()

Convert geometry to normal form (or canonical form).

This method orders the coordinates, rings of a polygon and parts of multi geometries consistently. Typically useful for testing purposes (for example in combination with equals_exact).

Examples

>>> from shapely import MultiLineString
>>> line = MultiLineString([[(0, 0), (1, 1)], [(3, 3), (2, 2)]])
>>> line.normalize()
<MULTILINESTRING ((2 2, 3 3), (0 0, 1 1))>
property oriented_envelope

Return the oriented envelope (minimum rotated rectangle) of a geometry.

The oriented envelope encloses an input geometry, such that the resulting rectangle has minimum area.

Unlike envelope this rectangle is not constrained to be parallel to the coordinate axes. If the convex hull of the object is a degenerate (line or point) this degenerate is returned.

The starting point of the rectangle is not fixed. You can use normalize() to reorganize the rectangle to strict canonical form so the starting point is always the lower left point.

Alias of minimum_rotated_rectangle.

overlaps(other)

Return True if geometries overlap, else False.

point_on_surface()

Return a point guaranteed to be within the object, cheaply.

Alias of representative_point.

project(other, normalized=False)

Return the distance of geometry to a point nearest the specified point.

If the normalized arg is True, return the distance normalized to the length of the linear geometry.

Alias of line_locate_point.

relate(other)

Return the DE-9IM intersection matrix for the two geometries (string).

relate_pattern(other, pattern)

Return True if the DE-9IM relationship code satisfies the pattern.

representative_point()

Return a point guaranteed to be within the object, cheaply.

Alias of point_on_surface.

reverse()

Return a copy of this geometry with the order of coordinates reversed.

If the geometry is a polygon with interior rings, the interior rings are also reversed.

Points are unchanged.

See also

is_ccw

Checks if a geometry is clockwise.

Examples

>>> from shapely import LineString, Polygon
>>> LineString([(0, 0), (1, 2)]).reverse()
<LINESTRING (1 2, 0 0)>
>>> Polygon([(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)]).reverse()
<POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))>
segmentize(max_segment_length)

Add vertices to line segments based on maximum segment length.

Additional vertices will be added to every line segment in an input geometry so that segments are no longer than the provided maximum segment length. New vertices will evenly subdivide each segment.

Only linear components of input geometries are densified; other geometries are returned unmodified.

Parameters:

max_segment_length (float or array_like) – Additional vertices will be added so that all line segments are no longer this value. Must be greater than 0.

Examples

>>> from shapely import LineString, Polygon
>>> LineString([(0, 0), (0, 10)]).segmentize(max_segment_length=5)
<LINESTRING (0 0, 0 5, 0 10)>
>>> Polygon([(0, 0), (10, 0), (10, 10), (0, 10), (0, 0)]).segmentize(max_segment_length=5)
<POLYGON ((0 0, 5 0, 10 0, 10 5, 10 10, 5 10, 0 10, 0 5, 0 0))>
simplify(tolerance, preserve_topology=True)

Return a simplified geometry produced by the Douglas-Peucker algorithm.

Coordinates of the simplified geometry will be no more than the tolerance distance from the original. Unless the topology preserving option is used, the algorithm may produce self-intersecting or otherwise invalid geometries.

svg(scale_factor=1.0, fill_color=None, opacity=None)[source]

Return a group of SVG circle elements for the MultiPoint geometry.

Parameters:
  • scale_factor (float) – Multiplication factor for the SVG circle diameters. Default is 1.

  • fill_color (str, optional) – Hex string for fill color. Default is to use “#66cc99” if geometry is valid, and “#ff3333” if invalid.

  • opacity (float) – Float number between 0 and 1 for color opacity. Default value is 0.6

symmetric_difference(other, grid_size=None)

Return the symmetric difference of the geometries.

Refer to shapely.symmetric_difference for full documentation.

touches(other)

Return True if geometries touch, else False.

property type

Get the geometry type (deprecated).

Deprecated since version 2.0: Use the geom_type attribute instead.

union(other, grid_size=None)

Return the union of the geometries.

Refer to shapely.union for full documentation.

within(other)

Return True if geometry is within the other, else False.

property wkb

WKB representation of the geometry.

property wkb_hex

WKB hex representation of the geometry.

property wkt

WKT representation of the geometry.

property xy

Separate arrays of X and Y coordinate values.

class pyorps.raster.handler.Polygon(shell=None, holes=None)[source]

Bases: BaseGeometry

A geometry type representing an area that is enclosed by a linear ring.

A polygon is a two-dimensional feature and has a non-zero area. It may have one or more negative-space “holes” which are also bounded by linear rings. If any rings cross each other, the feature is invalid and operations on it may fail.

Parameters:
  • shell (sequence) – A sequence of (x, y [,z]) numeric coordinate pairs or triples, or an array-like with shape (N, 2) or (N, 3). Also can be a sequence of Point objects.

  • holes (sequence) – A sequence of objects which satisfy the same requirements as the shell parameters above

exterior

The ring which bounds the positive space of the polygon.

Type:

LinearRing

interiors

A sequence of rings which bound all existing holes.

Type:

sequence

Examples

Create a square polygon with no holes

>>> from shapely import Polygon
>>> coords = ((0., 0.), (0., 1.), (1., 1.), (1., 0.), (0., 0.))
>>> polygon = Polygon(coords)
>>> polygon.area
1.0
__and__(other)

Return the intersection of the geometries.

__bool__()

Return True if the geometry is not empty, else False.

__format__(format_spec)

Format a geometry using a format specification.

static __new__(self, shell=None, holes=None)[source]

Create a new Polygon geometry.

__nonzero__()

Return True if the geometry is not empty, else False.

__or__(other)

Return the union of the geometries.

__reduce__()

Pickle support.

__repr__()

Return a string representation of the geometry.

__str__()

Return a string representation of the geometry.

__sub__(other)

Return the difference of the geometries.

__xor__(other)

Return the symmetric difference of the geometries.

property area

Unitless area of the geometry (float).

property boundary

Return a lower dimension geometry that bounds the object.

The boundary of a polygon is a line, the boundary of a line is a collection of points. The boundary of a point is an empty (null) collection.

property bounds

Return minimum bounding region (minx, miny, maxx, maxy).

buffer(distance, quad_segs=16, cap_style='round', join_style='round', mitre_limit=5.0, single_sided=False, **kwargs)

Get a geometry that represents all points within a distance of this geometry.

A positive distance produces a dilation, a negative distance an erosion. A very small or zero distance may sometimes be used to “tidy” a polygon.

Parameters:
  • distance (float) – The distance to buffer around the object.

  • quad_segs (int, optional) – Sets the number of line segments used to approximate an angle fillet.

  • cap_style (shapely.BufferCapStyle or {'round', 'square', 'flat'}, default 'round') – Specifies the shape of buffered line endings. BufferCapStyle.round (‘round’) results in circular line endings (see quad_segs). Both BufferCapStyle.square (‘square’) and BufferCapStyle.flat (‘flat’) result in rectangular line endings, only BufferCapStyle.flat (‘flat’) will end at the original vertex, while BufferCapStyle.square (‘square’) involves adding the buffer width.

  • join_style (shapely.BufferJoinStyle or {'round', 'mitre', 'bevel'}, default 'round') – Specifies the shape of buffered line midpoints. BufferJoinStyle.ROUND (‘round’) results in rounded shapes. BufferJoinStyle.bevel (‘bevel’) results in a beveled edge that touches the original vertex. BufferJoinStyle.mitre (‘mitre’) results in a single vertex that is beveled depending on the mitre_limit parameter.

  • mitre_limit (float, optional) – The mitre limit ratio is used for very sharp corners. The mitre ratio is the ratio of the distance from the corner to the end of the mitred offset corner. When two line segments meet at a sharp angle, a miter join will extend the original geometry. To prevent unreasonable geometry, the mitre limit allows controlling the maximum length of the join corner. Corners with a ratio which exceed the limit will be beveled.

  • single_sided (bool, optional) –

    The side used is determined by the sign of the buffer distance:

    a positive distance indicates the left-hand side a negative distance indicates the right-hand side

    The single-sided buffer of point geometries is the same as the regular buffer. The End Cap Style for single-sided buffers is always ignored, and forced to the equivalent of CAP_FLAT.

  • quadsegs (int, optional) – Deprecated aliases for quad_segs.

  • resolution (int, optional) – Deprecated aliases for quad_segs.

  • **kwargs (dict, optional) – For backwards compatibility of renamed parameters. If an unsupported kwarg is passed, a ValueError will be raised.

Return type:

Geometry

Notes

The return value is a strictly two-dimensional geometry. All Z coordinates of the original geometry will be ignored.

Deprecated since version 2.1.0: A deprecation warning is shown if quad_segs, cap_style, join_style, mitre_limit or single_sided are specified as positional arguments. In a future release, these will need to be specified as keyword arguments.

Examples

>>> from shapely import BufferCapStyle
>>> from shapely.wkt import loads
>>> g = loads('POINT (0.0 0.0)')

16-gon approx of a unit radius circle:

>>> g.buffer(1.0).area
3.1365484905459398

128-gon approximation:

>>> g.buffer(1.0, 128).area
3.1415138011443013

triangle approximation:

>>> g.buffer(1.0, 3).area
3.0
>>> list(g.buffer(1.0, cap_style=BufferCapStyle.square).exterior.coords)
[(1.0, 1.0), (1.0, -1.0), (-1.0, -1.0), (-1.0, 1.0), (1.0, 1.0)]
>>> g.buffer(1.0, cap_style=BufferCapStyle.square).area
4.0
property centroid

Return the geometric center of the object.

contains(other)

Return True if the geometry contains the other, else False.

contains_properly(other)

Return True if the geometry completely contains the other.

There should be no common boundary points.

Refer to shapely.contains_properly for full documentation.

property convex_hull

Return the convex hull of the geometry.

Imagine an elastic band stretched around the geometry: that’s a convex hull, more or less.

The convex hull of a three member multipoint, for example, is a triangular polygon.

property coords

Not implemented for polygons.

covered_by(other)

Return True if the geometry is covered by the other, else False.

covers(other)

Return True if the geometry covers the other, else False.

crosses(other)

Return True if the geometries cross, else False.

difference(other, grid_size=None)

Return the difference of the geometries.

Refer to shapely.difference for full documentation.

disjoint(other)

Return True if geometries are disjoint, else False.

distance(other)

Unitless distance to other geometry (float).

dwithin(other, distance)

Return True if geometry is within a given distance from the other.

Refer to shapely.dwithin for full documentation.

property envelope

A figure that envelopes the geometry.

equals(other)

Return True if geometries are equal, else False.

This method considers point-set equality (or topological equality), and is equivalent to (self.within(other) & self.contains(other)).

Examples

>>> from shapely import LineString
>>> LineString(
...     [(0, 0), (2, 2)]
... ).equals(
...     LineString([(0, 0), (1, 1), (2, 2)])
... )
True
Return type:

bool

equals_exact(other, tolerance=0.0, *, normalize=False)

Return True if the geometries are equivalent within the tolerance.

Refer to equals_exact() for full documentation.

Parameters:
  • other (BaseGeometry) – The other geometry object in this comparison.

  • tolerance (float, optional (default: 0.)) – Absolute tolerance in the same units as coordinates.

  • normalize (bool, optional (default: False)) –

    If True, normalize the two geometries so that the coordinates are in the same order.

    Added in version 2.1.0.

Examples

>>> from shapely import LineString
>>> LineString(
...     [(0, 0), (2, 2)]
... ).equals_exact(
...     LineString([(0, 0), (1, 1), (2, 2)]),
...     1e-6
... )
False
Return type:

bool

property exterior

Return the exterior ring of the polygon.

classmethod from_bounds(xmin, ymin, xmax, ymax)[source]

Construct a Polygon() from spatial bounds.

property geom_type

Name of the geometry’s type, such as ‘Point’.

geometryType()

Get the geometry type (deprecated).

Deprecated since version 2.0: Use the geom_type attribute instead.

property has_m

True if the geometry’s coordinate sequence(s) have m values.

property has_z

True if the geometry’s coordinate sequence(s) have z values.

hausdorff_distance(other)

Unitless hausdorff distance to other geometry (float).

property interiors

Return the sequence of interior rings of the polygon.

interpolate(distance, normalized=False)

Return a point at the specified distance along a linear geometry.

Negative length values are taken as measured in the reverse direction from the end of the geometry. Out-of-range index values are handled by clamping them to the valid range of values. If the normalized arg is True, the distance will be interpreted as a fraction of the geometry’s length.

Alias of line_interpolate_point.

intersection(other, grid_size=None)

Return the intersection of the geometries.

Refer to shapely.intersection for full documentation.

intersects(other)

Return True if geometries intersect, else False.

property is_closed

True if the geometry is closed, else False.

Applicable only to linear geometries.

property is_empty

True if the set of points in this geometry is empty, else False.

property is_ring

True if the geometry is a closed ring, else False.

property is_simple

True if the geometry is simple.

Simple means that any self-intersections are only at boundary points.

property is_valid

True if the geometry is valid.

The definition depends on sub-class.

property length

Unitless length of the geometry (float).

line_interpolate_point(distance, normalized=False)

Return a point at the specified distance along a linear geometry.

Negative length values are taken as measured in the reverse direction from the end of the geometry. Out-of-range index values are handled by clamping them to the valid range of values. If the normalized arg is True, the distance will be interpreted as a fraction of the geometry’s length.

Alias of interpolate.

line_locate_point(other, normalized=False)

Return the distance of this geometry to a point nearest the specified point.

If the normalized arg is True, return the distance normalized to the length of the linear geometry.

Alias of project.

property minimum_clearance

Unitless distance a node can be moved to produce an invalid geometry (float).

property minimum_rotated_rectangle

Return the oriented envelope (minimum rotated rectangle) of the geometry.

The oriented envelope encloses an input geometry, such that the resulting rectangle has minimum area.

Unlike envelope this rectangle is not constrained to be parallel to the coordinate axes. If the convex hull of the object is a degenerate (line or point) this degenerate is returned.

The starting point of the rectangle is not fixed. You can use normalize() to reorganize the rectangle to strict canonical form so the starting point is always the lower left point.

Alias of oriented_envelope.

normalize()

Convert geometry to normal form (or canonical form).

This method orders the coordinates, rings of a polygon and parts of multi geometries consistently. Typically useful for testing purposes (for example in combination with equals_exact).

Examples

>>> from shapely import MultiLineString
>>> line = MultiLineString([[(0, 0), (1, 1)], [(3, 3), (2, 2)]])
>>> line.normalize()
<MULTILINESTRING ((2 2, 3 3), (0 0, 1 1))>
property oriented_envelope

Return the oriented envelope (minimum rotated rectangle) of a geometry.

The oriented envelope encloses an input geometry, such that the resulting rectangle has minimum area.

Unlike envelope this rectangle is not constrained to be parallel to the coordinate axes. If the convex hull of the object is a degenerate (line or point) this degenerate is returned.

The starting point of the rectangle is not fixed. You can use normalize() to reorganize the rectangle to strict canonical form so the starting point is always the lower left point.

Alias of minimum_rotated_rectangle.

overlaps(other)

Return True if geometries overlap, else False.

point_on_surface()

Return a point guaranteed to be within the object, cheaply.

Alias of representative_point.

project(other, normalized=False)

Return the distance of geometry to a point nearest the specified point.

If the normalized arg is True, return the distance normalized to the length of the linear geometry.

Alias of line_locate_point.

relate(other)

Return the DE-9IM intersection matrix for the two geometries (string).

relate_pattern(other, pattern)

Return True if the DE-9IM relationship code satisfies the pattern.

representative_point()

Return a point guaranteed to be within the object, cheaply.

Alias of point_on_surface.

reverse()

Return a copy of this geometry with the order of coordinates reversed.

If the geometry is a polygon with interior rings, the interior rings are also reversed.

Points are unchanged.

See also

is_ccw

Checks if a geometry is clockwise.

Examples

>>> from shapely import LineString, Polygon
>>> LineString([(0, 0), (1, 2)]).reverse()
<LINESTRING (1 2, 0 0)>
>>> Polygon([(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)]).reverse()
<POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))>
segmentize(max_segment_length)

Add vertices to line segments based on maximum segment length.

Additional vertices will be added to every line segment in an input geometry so that segments are no longer than the provided maximum segment length. New vertices will evenly subdivide each segment.

Only linear components of input geometries are densified; other geometries are returned unmodified.

Parameters:

max_segment_length (float or array_like) – Additional vertices will be added so that all line segments are no longer this value. Must be greater than 0.

Examples

>>> from shapely import LineString, Polygon
>>> LineString([(0, 0), (0, 10)]).segmentize(max_segment_length=5)
<LINESTRING (0 0, 0 5, 0 10)>
>>> Polygon([(0, 0), (10, 0), (10, 10), (0, 10), (0, 0)]).segmentize(max_segment_length=5)
<POLYGON ((0 0, 5 0, 10 0, 10 5, 10 10, 5 10, 0 10, 0 5, 0 0))>
simplify(tolerance, preserve_topology=True)

Return a simplified geometry produced by the Douglas-Peucker algorithm.

Coordinates of the simplified geometry will be no more than the tolerance distance from the original. Unless the topology preserving option is used, the algorithm may produce self-intersecting or otherwise invalid geometries.

svg(scale_factor=1.0, fill_color=None, opacity=None)[source]

Return SVG path element for the Polygon geometry.

Parameters:
  • scale_factor (float) – Multiplication factor for the SVG stroke-width. Default is 1.

  • fill_color (str, optional) – Hex string for fill color. Default is to use “#66cc99” if geometry is valid, and “#ff3333” if invalid.

  • opacity (float) – Float number between 0 and 1 for color opacity. Default value is 0.6

symmetric_difference(other, grid_size=None)

Return the symmetric difference of the geometries.

Refer to shapely.symmetric_difference for full documentation.

touches(other)

Return True if geometries touch, else False.

property type

Get the geometry type (deprecated).

Deprecated since version 2.0: Use the geom_type attribute instead.

union(other, grid_size=None)

Return the union of the geometries.

Refer to shapely.union for full documentation.

within(other)

Return True if geometry is within the other, else False.

property wkb

WKB representation of the geometry.

property wkb_hex

WKB hex representation of the geometry.

property wkt

WKT representation of the geometry.

property xy

Separate arrays of X and Y coordinate values.

class pyorps.raster.handler.RasterDataset(file_source, crs=None)[source]

Bases: GeoDataset, ABC

Parameters:
__init__(file_source, crs=None)
Parameters:
  • file_source (Any)

  • crs (str | None)

count: int
crs: str = None
data: Union[GeoDataFrame, ndarray, None] = None
dtype: dtype
file_source: Any
abstractmethod load_data(**kwargs)
shape: tuple[int, int]
transform: Affine
class pyorps.raster.handler.RasterHandler(raster_source, source_coords, target_coords, search_space_buffer_m=None, input_crs=None, apply_mask=True, outside_value=None, bands=None)[source]

Bases: object

Class for efficiently working with raster data while preserving geographic transformation information. Can be initialized with either a file path or directly with raster data, CRS, and transform.

Parameters:
__init__(raster_source, source_coords, target_coords, search_space_buffer_m=None, input_crs=None, apply_mask=True, outside_value=None, bands=None)[source]

Initialize a RasterHandler for working with raster data and coordinate transformations.

Creates a window and buffer geometry based on source and target coordinates: - If source and target are single coordinates: creates a line buffer - If source and/or target are lists of coordinates: creates a polygon buffer

Parameters:
  • raster_source (RasterDataset) – Either: - Path to the raster file (str), or - Tuple of (data_array, crs, transform)

  • source_coords (Union[Tuple[float, float], List[Tuple[float, float]]]) – Source point(s) as (x, y) tuple or list of tuples

  • target_coords (Union[Tuple[float, float], List[Tuple[float, float]]]) – Target point(s) as (x, y) tuple or list of tuples

  • search_space_buffer_m (Optional[float]) – Buffer distance in map units (typically meters)

  • input_crs (Optional[str]) – CRS of the input coordinates (e.g., ‘EPSG:4326’). If None, assumes same as raster

  • apply_mask (bool) – If True, apply the buffer mask after loading data

  • outside_value (Optional[Any]) – Value to set for pixels outside the buffer (defaults to max value of the data type)

  • bands (Optional[List[int]]) – List of bands to modify if apply_mask is True (1-based). If None, all bands are modified

apply_geometry_mask(geometry, outside_value=None, bands=None)[source]

Set pixel values outside the given geometry to the specified value.

Parameters:
  • geometry (Polygon) – A shapely geometry object (Polygon)

  • outside_value (Optional[int]) – Value to set for pixels outside the geometry

  • bands (Union[list[int], int, None]) – List of bands to modify (1-based). If None, all bands are modified.

buffer_geometry: Polygon
coords_to_indices(coords)[source]

Convert geographic coordinates to pixel row/column indices within this raster section.

Parameters:

coords (Union[tuple[float, float], list[float], list[Union[tuple[float, float], list[float]]]]) – List of (x, y) coordinate tuples or a single coordinate tuple

Returns:

Array of (row, col) pixel indices

Return type:

numpy.ndarray

data: ndarray
estimate_buffer_width(source_coords, target_coords, min_buffer=200, max_buffer=4000, sample_radius=50)[source]

Estimate an appropriate buffer width for path finding based on terrain characteristics.

Parameters:
Returns:

Estimated optimal buffer width in meters

indices_to_coords(indices)[source]

Convert pixel indices to geographic coordinates.

Parameters:

indices (List[Tuple[int, int]]) – List of (row, col) pixel indices

Returns:

Array of (x, y) coordinates

Return type:

numpy.ndarray

static max_distance_pair(coords1, coords2)[source]

Find the pair of coordinates (one from coords1, one from coords2) with the highest Euclidean distance.

Parameters:
Returns:

A tuple containing the two points with the maximum distance (point1, point2)

raster_dataset: RasterDataset
save_section_as_raster(output_path)[source]

Save the section as a new raster file with proper geo referencing.

Parameters:

output_path (str) – Path for the output raster file

search_space_buffer_m: float
window: Window
window_transform: Affine
class pyorps.raster.handler.Transformer(transformer_maker=None)[source]

Bases: object

The Transformer class is for facilitating re-using transforms without needing to re-create them. The goal is to make repeated transforms faster.

Additionally, it provides multiple methods for initialization.

Added in version 2.1.0.

Parameters:

transformer_maker (TransformerMaker | None)

__init__(transformer_maker=None)[source]
Parameters:

transformer_maker (TransformerMaker | None)

Return type:

None

property accuracy: float

Expected accuracy of the transformation. -1 if unknown.

Type:

float

property area_of_use: AreaOfUse

Added in version 2.3.0.

Returns:

The area of use object with associated attributes.

Return type:

AreaOfUse

property definition: str

Definition of the projection.

Type:

str

property description: str

Description of the projection.

Type:

str

static from_crs(crs_from, crs_to, always_xy=False, area_of_interest=None, authority=None, accuracy=None, allow_ballpark=None, force_over=False, only_best=None)[source]

Make a Transformer from a pyproj.crs.CRS or input used to create one.

See:

  • proj_create_crs_to_crs()

  • proj_create_crs_to_crs_from_pj()

Added in version 2.2.0: always_xy

Added in version 2.3.0: area_of_interest

Added in version 3.1.0: authority, accuracy, allow_ballpark

Added in version 3.4.0: force_over

Added in version 3.5.0: only_best

Parameters:
  • crs_from (pyproj.crs.CRS or input used to create one) – Projection of input data.

  • crs_to (pyproj.crs.CRS or input used to create one) – Projection of output data.

  • always_xy (bool, default False) – If true, the transform method will accept as input and return as output coordinates using the traditional GIS order, that is longitude, latitude for geographic CRS and easting, northing for most projected CRS.

  • area_of_interest (AreaOfInterest, optional) – The area of interest to help select the transformation.

  • authority (str, optional) – When not specified, coordinate operations from any authority will be searched, with the restrictions set in the authority_to_authority_preference database table related to the authority of the source/target CRS themselves. If authority is set to “any”, then coordinate operations from any authority will be searched. If authority is a non-empty string different from “any”, then coordinate operations will be searched only in that authority namespace (e.g. EPSG).

  • accuracy (float, optional) – The minimum desired accuracy (in metres) of the candidate coordinate operations.

  • allow_ballpark (bool, optional) – Set to False to disallow the use of Ballpark transformation in the candidate coordinate operations. Default is to allow.

  • force_over (bool, default False) – If True, it will to force the +over flag on the transformation. Requires PROJ 9+.

  • only_best (bool, optional) – Can be set to True to cause PROJ to error out if the best transformation known to PROJ and usable by PROJ if all grids known and usable by PROJ were accessible, cannot be used. Best transformation should be understood as the transformation returned by proj_get_suggested_operation() if all known grids were accessible (either locally or through network). Note that the default value for this option can be also set with the PROJ_ONLY_BEST_DEFAULT environment variable, or with the only_best_default setting of proj-ini. The only_best kwarg overrides the default value if set. Requires PROJ 9.2+.

Return type:

Transformer

static from_pipeline(proj_pipeline)[source]

Make a Transformer from a PROJ pipeline string.

pipeline

See:

  • proj_create()

  • proj_create_from_database()

Added in version 3.1.0: AUTH:CODE string support (e.g. EPSG:1671)

Allowed input:
  • a PROJ string

  • a WKT string

  • a PROJJSON string

  • an object code (e.g. “EPSG:1671” “urn:ogc:def:coordinateOperation:EPSG::1671”)

  • an object name. e.g “ITRF2014 to ETRF2014 (1)”. In that case as uniqueness is not guaranteed, heuristics are applied to determine the appropriate best match.

  • a OGC URN combining references for concatenated operations (e.g. “urn:ogc:def:coordinateOperation,coordinateOperation:EPSG::3895, coordinateOperation:EPSG::1618”)

Parameters:

proj_pipeline (str) – Projection pipeline string.

Return type:

Transformer

static from_proj(proj_from, proj_to, always_xy=False, area_of_interest=None)[source]

Make a Transformer from a pyproj.Proj or input used to create one.

Deprecated since version 3.4.1: from_crs() is preferred.

Added in version 2.2.0: always_xy

Added in version 2.3.0: area_of_interest

Parameters:
  • proj_from (pyproj.Proj or input used to create one) – Projection of input data.

  • proj_to (pyproj.Proj or input used to create one) – Projection of output data.

  • always_xy (bool, default False) – If true, the transform method will accept as input and return as output coordinates using the traditional GIS order, that is longitude, latitude for geographic CRS and easting, northing for most projected CRS.

  • area_of_interest (AreaOfInterest, optional) – The area of interest to help select the transformation.

Return type:

Transformer

get_last_used_operation()[source]

Added in version 3.4.0.

Note

Requires PROJ 9.1+

See: proj_trans_get_last_used_operation()

Returns:

The operation used in the transform call.

Return type:

Transformer

property has_inverse: bool

True if an inverse mapping exists.

Type:

bool

is_exact_same(other)[source]

Check if the Transformer objects are the exact same. If it is not a Transformer, then it returns False.

Parameters:

other (Any)

Return type:

bool

property is_network_enabled: bool

Added in version 3.0.0.

Returns:

If the network is enabled.

Return type:

bool

itransform(points, switch=False, time_3rd=False, radians=False, errcheck=False, direction=TransformDirection.FORWARD)[source]

Iterator/generator version of the function pyproj.Transformer.transform.

See: proj_trans_generic()

Added in version 2.1.1: errcheck

Added in version 2.2.0: direction

Parameters:
  • points (list) – List of point tuples.

  • switch (bool, default False) – If True x, y or lon,lat coordinates of points are switched to y, x or lat, lon. Default is False.

  • time_3rd (bool, default False) – If the input coordinates are 3 dimensional and the 3rd dimension is time.

  • radians (bool, default False) – If True, will expect input data to be in radians and will return radians if the projection is geographic. Otherwise, it uses degrees. Ignored for pipeline transformations with pyproj 2, but will work in pyproj 3.

  • errcheck (bool, default False) – If True, an exception is raised if the errors are found in the process. If False, inf is returned for errors.

  • direction (pyproj.enums.TransformDirection, optional) – The direction of the transform. Default is pyproj.enums.TransformDirection.FORWARD.

Return type:

Iterator[Iterable]

Example

>>> from pyproj import Transformer
>>> transformer = Transformer.from_crs(4326, 2100)
>>> points = [(22.95, 40.63), (22.81, 40.53), (23.51, 40.86)]
>>> for pt in transformer.itransform(points): '{:.3f} {:.3f}'.format(*pt)
'2221638.801 2637034.372'
'2212924.125 2619851.898'
'2238294.779 2703763.736'
>>> pipeline_str = (
...     "+proj=pipeline +step +proj=longlat +ellps=WGS84 "
...     "+step +proj=unitconvert +xy_in=rad +xy_out=deg"
... )
>>> pipe_trans = Transformer.from_pipeline(pipeline_str)
>>> for pt in pipe_trans.itransform([(2.1, 0.001)]):
...     '{:.3f} {:.3f}'.format(*pt)
'2.100 0.001'
>>> transproj = Transformer.from_crs(
...     {"proj":'geocent', "ellps":'WGS84', "datum":'WGS84'},
...     "EPSG:4326",
...     always_xy=True,
... )
>>> for pt in transproj.itransform(
...     [(-2704026.010, -4253051.810, 3895878.820)],
...     radians=True,
... ):
...     '{:.3f} {:.3f} {:.3f}'.format(*pt)
'-2.137 0.661 -20.531'
>>> transprojr = Transformer.from_crs(
...     "EPSG:4326",
...     {"proj":'geocent', "ellps":'WGS84', "datum":'WGS84'},
...     always_xy=True,
... )
>>> for pt in transprojr.itransform(
...     [(-2.137, 0.661, -20.531)],
...     radians=True
... ):
...     '{:.3f} {:.3f} {:.3f}'.format(*pt)
'-2704214.394 -4254414.478 3894270.731'
>>> transproj_eq = Transformer.from_crs(
...     'EPSG:4326',
...     '+proj=longlat +datum=WGS84 +no_defs +type=crs',
...     always_xy=True,
... )
>>> for pt in transproj_eq.itransform([(-2.137, 0.661)]):
...     '{:.3f} {:.3f}'.format(*pt)
'-2.137 0.661'
property name: str

Name of the projection.

Type:

str

property operations: tuple[CoordinateOperation] | None

Added in version 2.4.0.

Returns:

The operations in a concatenated operation.

Return type:

tuple[CoordinateOperation]

property remarks: str

Added in version 2.4.0.

Returns:

Remarks about object.

Return type:

str

property scope: str

Added in version 2.4.0.

Returns:

Scope of object.

Return type:

str

property source_crs: CRS | None

Added in version 3.3.0.

Returns:

The source CRS of a CoordinateOperation.

Return type:

CRS | None

property target_crs: CRS | None

Added in version 3.3.0.

Returns:

The target CRS of a CoordinateOperation.

Return type:

CRS | None

to_json(pretty=False, indentation=2)[source]

Convert the projection to a JSON string.

Added in version 2.4.0.

Parameters:
  • pretty (bool, default False) – If True, it will set the output to be a multiline string.

  • indentation (int, default 2) – If pretty is True, it will set the width of the indentation.

Returns:

The JSON string.

Return type:

str

to_json_dict()[source]

Convert the projection to a JSON dictionary.

Added in version 2.4.0.

Returns:

The JSON dictionary.

Return type:

dict

to_proj4(version=ProjVersion.PROJ_5, pretty=False)[source]

Convert the projection to a PROJ string.

Added in version 3.1.0.

Parameters:
  • version (pyproj.enums.ProjVersion) – The version of the PROJ string output. Default is pyproj.enums.ProjVersion.PROJ_5.

  • pretty (bool, default False) – If True, it will set the output to be a multiline string.

Returns:

The PROJ string.

Return type:

str

to_wkt(version=WktVersion.WKT2_2019, pretty=False)[source]

Convert the projection to a WKT string.

Version options:
  • WKT2_2015

  • WKT2_2015_SIMPLIFIED

  • WKT2_2019

  • WKT2_2019_SIMPLIFIED

  • WKT1_GDAL

  • WKT1_ESRI

Parameters:
  • version (pyproj.enums.WktVersion, optional) – The version of the WKT output. Default is pyproj.enums.WktVersion.WKT2_2019.

  • pretty (bool, default False) – If True, it will set the output to be a multiline string.

Returns:

The WKT string.

Return type:

str

transform(xx, yy, zz=None, tt=None, radians=False, errcheck=False, direction=TransformDirection.FORWARD, inplace=False)[source]

Transform points between two coordinate systems.

See: proj_trans_generic()

Added in version 2.1.1: errcheck

Added in version 2.2.0: direction

Added in version 3.2.0: inplace

Accepted numeric scalar or array:

Parameters:
  • xx (scalar or array) – Input x coordinate(s).

  • yy (scalar or array) – Input y coordinate(s).

  • zz (scalar or array, optional) – Input z coordinate(s).

  • tt (scalar or array, optional) – Input time coordinate(s).

  • radians (bool, default False) – If True, will expect input data to be in radians and will return radians if the projection is geographic. Otherwise, it uses degrees. Ignored for pipeline transformations with pyproj 2, but will work in pyproj 3.

  • errcheck (bool, default False) – If True, an exception is raised if the errors are found in the process. If False, inf is returned for errors.

  • direction (pyproj.enums.TransformDirection, optional) – The direction of the transform. Default is pyproj.enums.TransformDirection.FORWARD.

  • inplace (bool, default False) – If True, will attempt to write the results to the input array instead of returning a new array. This will fail if the input is not an array in C order with the double data type.

Example

>>> from pyproj import Transformer
>>> transformer = Transformer.from_crs("EPSG:4326", "EPSG:3857")
>>> x3, y3 = transformer.transform(33, 98)
>>> f"{x3:.3f}  {y3:.3f}"
'10909310.098  3895303.963'
>>> pipeline_str = (
...     "+proj=pipeline +step +proj=longlat +ellps=WGS84 "
...     "+step +proj=unitconvert +xy_in=rad +xy_out=deg"
... )
>>> pipe_trans = Transformer.from_pipeline(pipeline_str)
>>> xt, yt = pipe_trans.transform(2.1, 0.001)
>>> f"{xt:.3f}  {yt:.3f}"
'2.100  0.001'
>>> transproj = Transformer.from_crs(
...     {"proj":'geocent', "ellps":'WGS84', "datum":'WGS84'},
...     "EPSG:4326",
...     always_xy=True,
... )
>>> xpj, ypj, zpj = transproj.transform(
...     -2704026.010,
...     -4253051.810,
...     3895878.820,
...     radians=True,
... )
>>> f"{xpj:.3f} {ypj:.3f} {zpj:.3f}"
'-2.137 0.661 -20.531'
>>> transprojr = Transformer.from_crs(
...     "EPSG:4326",
...     {"proj":'geocent', "ellps":'WGS84', "datum":'WGS84'},
...     always_xy=True,
... )
>>> xpjr, ypjr, zpjr = transprojr.transform(xpj, ypj, zpj, radians=True)
>>> f"{xpjr:.3f} {ypjr:.3f} {zpjr:.3f}"
'-2704026.010 -4253051.810 3895878.820'
>>> transformer = Transformer.from_crs("EPSG:4326", 4326)
>>> xeq, yeq = transformer.transform(33, 98)
>>> f"{xeq:.0f}  {yeq:.0f}"
'33  98'
transform_bounds(left, bottom, right, top, densify_pts=21, radians=False, errcheck=False, direction=TransformDirection.FORWARD)[source]

Added in version 3.1.0.

See: proj_trans_bounds()

Transform boundary densifying the edges to account for nonlinear transformations along these edges and extracting the outermost bounds.

If the destination CRS is geographic and right < left then the bounds crossed the antimeridian. In this scenario there are two polygons, one on each side of the antimeridian. The first polygon should be constructed with (left, bottom, 180, top) and the second with (-180, bottom, top, right).

To construct the bounding polygons with shapely:

def bounding_polygon(left, bottom, right, top):
    if right < left:
        return shapely.geometry.MultiPolygon(
            [
                shapely.geometry.box(left, bottom, 180, top),
                shapely.geometry.box(-180, bottom, right, top),
            ]
        )
    return shapely.geometry.box(left, bottom, right, top)
Parameters:
  • left (float) – Minimum bounding coordinate of the first axis in source CRS (or the target CRS if using the reverse direction).

  • bottom (float) – Minimum bounding coordinate of the second axis in source CRS. (or the target CRS if using the reverse direction).

  • right (float) – Maximum bounding coordinate of the first axis in source CRS. (or the target CRS if using the reverse direction).

  • top (float) – Maximum bounding coordinate of the second axis in source CRS. (or the target CRS if using the reverse direction).

  • densify_points (uint, default 21) – Number of points to add to each edge to account for nonlinear edges produced by the transform process. Large numbers will produce worse performance.

  • radians (bool, default False) – If True, will expect input data to be in radians and will return radians if the projection is geographic. Otherwise, it uses degrees.

  • errcheck (bool, default False) – If True, an exception is raised if the errors are found in the process. If False, inf is returned for errors.

  • direction (pyproj.enums.TransformDirection, optional) – The direction of the transform. Default is pyproj.enums.TransformDirection.FORWARD.

  • densify_pts (int)

Returns:

left, bottom, right, top – Outermost coordinates in target coordinate reference system.

Return type:

float

class pyorps.raster.handler.Window(col_off, row_off, width, height)[source]

Bases: object

Windows are rectangular subsets of rasters.

This class abstracts the 2-tuples mentioned in the module docstring and adds methods and new constructors.

col_off, row_off

The offset for the window.

Type:

float

width, height

Lengths of the window.

Type:

float

Notes

Previously the lengths were called ‘num_cols’ and ‘num_rows’ but this is a bit confusing in the new float precision world and the attributes have been changed. The originals are deprecated.

__init__(col_off, row_off, width, height)

Method generated by attrs for class Window.

Return type:

None

col_off
crop(height, width)[source]

Return a copy cropped to height and width

flatten()[source]

A flattened form of the window.

Returns:

col_off, row_off, width, height – Window offsets and lengths.

Return type:

float

classmethod from_slices(rows, cols, height=-1, width=-1, boundless=False)[source]

Construct a Window from row and column slices or tuples / lists of start and stop indexes. Converts the rows and cols to offsets, height, and width.

In general, indexes are defined relative to the upper left corner of the dataset: rows=(0, 10), cols=(0, 4) defines a window that is 4 columns wide and 10 rows high starting from the upper left.

Start indexes may be None and will default to 0. Stop indexes may be None and will default to width or height, which must be provided in this case.

Negative start indexes are evaluated relative to the lower right of the dataset: rows=(-2, None), cols=(-2, None) defines a window that is 2 rows high and 2 columns wide starting from the bottom right.

Parameters:
  • rows (slice, tuple, or list) – Slices or 2 element tuples/lists containing start, stop indexes.

  • cols (slice, tuple, or list) – Slices or 2 element tuples/lists containing start, stop indexes.

  • height (float) – A shape to resolve relative values against. Only used when a start or stop index is negative or a stop index is None.

  • width (float) – A shape to resolve relative values against. Only used when a start or stop index is negative or a stop index is None.

  • boundless (bool, optional) – Whether the inputs are bounded (default) or not.

Return type:

Window

height
intersection(other)[source]

Return the intersection of this window and another

Parameters:

other (Window) – Another window

Return type:

Window

round(ndigits=None)[source]

Round a window’s offsets and lengths

Rounding to a very small fraction of a pixel can help treat floating point issues arising from computation of windows.

round_lengths(**kwds)[source]

Return a copy with width and height rounded.

Lengths are rounded to the nearest whole number. The offsets are not changed.

Parameters:

kwds (dict) – Collects keyword arguments that are no longer used.

Return type:

Window

round_offsets(**kwds)[source]

Return a copy with column and row offsets rounded.

Offsets are rounded to the preceding whole number. The lengths are not changed.

Parameters:

kwds (dict) – Collects keyword arguments that are no longer used.

Return type:

Window

round_shape(**kwds)[source]
row_off
todict()[source]

A mapping of attribute names and values.

Return type:

dict

toranges()[source]

Makes an equivalent pair of range tuples

toslices()[source]

Slice objects for use as an ndarray indexer.

Returns:

row_slice, col_slice – A pair of slices in row, column order

Return type:

slice

width
pyorps.raster.handler.create_test_tiff(output_path, width=100, height=100, transform=None, crs='EPSG:32632', pattern='random', bands=1, nodata=None)[source]

Creates a synthetic GeoTIFF file for testing with different patterns.

Parameters:
  • output_path (str) – Path to save the test GeoTIFF file

  • width (int) – Width of the raster in pixels

  • height (int) – Height of the raster in pixels

  • transform (Optional[Affine]) – Affine transformation for the raster

  • crs – Coordinate reference system

  • pattern (str) – Data pattern - “random”, “gradient”, or “checkerboard”

  • bands (int) – Number of bands to create

  • nodata (Optional[int]) – No data value

Returns:

An array which can be used as a test raster

pyorps.raster.handler.from_origin(west, north, xsize, ysize)[source]

Return an Affine transformation given upper left and pixel sizes.

Return an Affine transformation for a georeferenced raster given the coordinates of its upper left corner west, north and pixel sizes xsize, ysize.

pyorps.raster.handler.rasterize(shapes, out_shape=None, fill=0, nodata=None, masked=False, out=None, transform=(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0), all_touched=False, merge_alg=MergeAlg.replace, default_value=1, dtype=None, skip_invalid=True, dst_path=None, dst_kwds=None)[source]

Return an image array with input geometries burned in.

Warnings will be raised for any invalid or empty geometries, and an exception will be raised if there are no valid shapes to rasterize.

Parameters:
  • shapes (iterable of (`geometry`, value) pairs or geometries) – The geometry can either be an object that implements the geo interface or GeoJSON-like object. If no value is provided the default_value will be used. If value is None the fill value will be used.

  • out_shape (tuple or list with 2 integers) – Shape of output numpy.ndarray.

  • fill (int or float, optional) – Used as fill value for all areas not covered by input geometries.

  • nodata (float, optional) – nodata value to use in output file or masked array.

  • masked (bool, optional. Default: False.) – If True, return a masked array. Note: nodata is always set in the case of file output.

  • out (numpy.ndarray, optional) – Array in which to store results. If not provided, out_shape and dtype are required.

  • transform (Affine transformation object, optional) – Transformation from pixel coordinates of source to the coordinate system of the input shapes. See the transform property of dataset objects.

  • all_touched (boolean, optional) – If True, all pixels touched by geometries will be burned in. If false, only pixels whose center is within the polygon or that are selected by Bresenham’s line algorithm will be burned in.

  • merge_alg (MergeAlg, optional) –

    Merge algorithm to use. One of:
    MergeAlg.replace (default):

    the new value will overwrite the existing value.

    MergeAlg.add:

    the new value will be added to the existing raster.

  • default_value (int or float, optional) – Used as value for all geometries, if not provided in shapes.

  • dtype (rasterio or numpy.dtype, optional) – Used as data type for results, if out is not provided.

  • skip_invalid (bool, optional) – If True (default), invalid shapes will be skipped. If False, ValueError will be raised.

  • dst_path (str or PathLike, optional) – Path of output dataset

  • dst_kwds (dict, optional) – Dictionary of creation options and other parameters that will be overlaid on the profile of the output dataset.

Returns:

If out was not None then out is returned, it will have been modified in-place. If out was None, this will be a new array.

Return type:

numpy.ndarray

Notes

Valid data types for fill, default_value, out, dtype and shape values are “int16”, “int32”, “uint8”, “uint16”, “uint32”, “float32”, and “float64”.

This function requires significant memory resources. The shapes iterator will be materialized to a Python list and another C copy of that list will be made. The out array will be copied and additional temporary raster memory equal to 2x the smaller of out data or GDAL’s max cache size (controlled by GDAL_CACHEMAX, default is 5% of the computer’s physical memory) is required.

If GDAL max cache size is smaller than the output data, the array of shapes will be iterated multiple times. Performance is thus a linear function of buffer size. For maximum speed, ensure that GDAL_CACHEMAX is larger than the size of out or out_shape.

pyorps.raster.handler.rio_open(fp, mode='r', driver=None, width=None, height=None, count=None, crs=None, transform=None, dtype=None, nodata=None, sharing=False, opener=None, **kwargs)

Open a dataset for reading or writing.

The dataset may be located in a local file, in a resource located by a URL, or contained within a stream of bytes. This function accepts different types of fp parameters. However, it is almost always best to pass a string that has a dataset name as its value. These are passed directly to GDAL protocol and format handlers. A path to a zipfile is more efficiently used by GDAL than a Python ZipFile object, for example.

In read (‘r’) or read/write (‘r+’) mode, no keyword arguments are required: these attributes are supplied by the opened dataset.

In write (‘w’ or ‘w+’) mode, the driver, width, height, count, and dtype keywords are strictly required.

Parameters:
  • fp (str, os.PathLike, file-like, or rasterio.io.MemoryFile) – A filename or URL, a file object opened in binary (‘rb’) mode, a Path object, or one of the rasterio classes that provides the dataset-opening interface (has an open method that returns a dataset). Use a string when possible: GDAL can more efficiently access a dataset if it opens it natively.

  • mode (str, optional) – ‘r’ (read, the default), ‘r+’ (read/write), ‘w’ (write), or ‘w+’ (write/read).

  • driver (str, optional) – A short format driver name (e.g. “GTiff” or “JPEG”) or a list of such names (see GDAL docs at https://gdal.org/drivers/raster/index.html). In ‘w’ or ‘w+’ modes a single name is required. In ‘r’ or ‘r+’ modes the driver can usually be omitted. Registered drivers will be tried sequentially until a match is found. When multiple drivers are available for a format such as JPEG2000, one of them can be selected by using this keyword argument.

  • width (int, optional) – The number of columns of the raster dataset. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • height (int, optional) – The number of rows of the raster dataset. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • count (int, optional) – The count of dataset bands. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • crs (str, dict, or CRS, optional) – The coordinate reference system. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • transform (affine.Affine, optional) – Affine transformation mapping the pixel space to geographic space. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • dtype (str or numpy.dtype, optional) – The data type for bands. For example: ‘uint8’ or rasterio.uint16. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • nodata (int, float, or nan, optional) – Defines the pixel value to be interpreted as not valid data. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • sharing (bool, optional) – To reduce overhead and prevent programs from running out of file descriptors, rasterio maintains a pool of shared low level dataset handles. If True this function will use a shared handle if one is available. Multithreaded programs must avoid sharing and should set sharing to False.

  • opener (callable, optional) – A custom dataset opener which can serve GDAL’s virtual filesystem machinery via Python file-like objects. The underlying file-like object is obtained by calling opener with (fp, mode) or (fp, mode + “b”) depending on the format driver’s native mode. opener must return a Python file-like object that provides read, seek, tell, and close methods. Note: only one opener at a time per fp, mode pair is allowed.

  • kwargs (optional) – These are passed to format drivers as directives for creating or interpreting datasets. For example: in ‘w’ or ‘w+’ modes a tiled=True keyword argument will direct the GeoTIFF format driver to create a tiled, rather than striped, TIFF.

Returns:

  • rasterio.io.DatasetReader – If mode is “r”.

  • rasterio.io.DatasetWriter – If mode is “r+”, “w”, or “w+”.

Raises:
  • TypeError – If arguments are of the wrong Python type.

  • rasterio.errors.RasterioIOError – If the dataset can not be opened. Such as when there is no dataset with the given name.

  • rasterio.errors.DriverCapabilityError – If the detected format driver does not support the requested opening mode.

Examples

To open a local GeoTIFF dataset for reading using standard driver discovery and no directives:

>>> import rasterio
>>> with rasterio.open('example.tif') as dataset:
...     print(dataset.profile)

To open a local JPEG2000 dataset using only the JP2OpenJPEG driver:

>>> with rasterio.open(
...         'example.jp2', driver='JP2OpenJPEG') as dataset:
...     print(dataset.profile)

To create a new 8-band, 16-bit unsigned, tiled, and LZW-compressed GeoTIFF with a global extent and 0.5 degree resolution:

>>> from rasterio.transform import from_origin
>>> with rasterio.open(
...         'example.tif', 'w', driver='GTiff', dtype='uint16',
...         width=720, height=360, count=8, crs='EPSG:4326',
...         transform=from_origin(-180.0, 90.0, 0.5, 0.5),
...         nodata=0, tiled=True, compress='lzw') as dataset:
...     dataset.write(...)
pyorps.raster.handler.rowcol(transform, xs, ys, zs=None, op=None, precision=None, **rpc_options)[source]

Get rows and cols of the pixels containing (x, y).

Parameters:
  • transform (Affine or sequence of GroundControlPoint or RPC) – Transform suitable for input to AffineTransformer, GCPTransformer, or RPCTransformer.

  • xs (list or float) – x values in coordinate reference system.

  • ys (list or float) – y values in coordinate reference system.

  • zs (list or float, optional) – Height associated with coordinates. Primarily used for RPC based coordinate transformations. Ignored for affine based transformations. Default: 0.

  • op (function, optional (default: numpy.floor)) – Function to convert fractional pixels to whole numbers (floor, ceiling, round)

  • precision (int or float, optional) – This parameter is unused, deprecated in rasterio 1.3.0, and will be removed in version 2.0.0.

  • rpc_options (dict, optional) – Additional arguments passed to GDALCreateRPCTransformer.

Returns:

  • rows (array of ints or floats)

  • cols (array of ints or floats) – Integers are the default. The numerical type is determined by the type returned by op().

pyorps.raster.handler.transform_window(window, transform)

Construct an affine transform matrix relative to a window.

Parameters:
  • window (Window) – The input window.

  • transform (Affine) – an affine transform matrix.

Returns:

The affine transform matrix for the given window

Return type:

Affine

pyorps.raster.handler.transform_xy(transform, rows, cols, zs=None, offset='center', **rpc_options)

Get the x and y coordinates of pixels at rows and cols.

The pixel’s center is returned by default, but a corner can be returned by setting offset to one of ul, ur, ll, lr.

Supports affine, Ground Control Point (GCP), or Rational Polynomial Coefficients (RPC) based coordinate transformations.

Parameters:
  • transform (Affine or sequence of GroundControlPoint or RPC) – Transform suitable for input to AffineTransformer, GCPTransformer, or RPCTransformer.

  • rows (list or int) – Pixel rows.

  • cols (int or sequence of ints) – Pixel columns.

  • zs (list or float, optional) – Height associated with coordinates. Primarily used for RPC based coordinate transformations. Ignored for affine based transformations. Default: 0.

  • offset (str, optional) – Determines if the returned coordinates are for the center of the pixel or for a corner.

  • rpc_options (dict, optional) – Additional arguments passed to GDALCreateRPCTransformer.

Returns:

  • xs (float or list of floats) – x coordinates in coordinate reference system

  • ys (float or list of floats) – y coordinates in coordinate reference system

pyorps.raster.rasterizer module

PYORPS: An Open-Source Tool for Automated Power Line Routing

Reference: [1] Hofmann, M., Stetz, T., Kammer, F., Repo, S.: ‘PYORPS: An Open-Source Tool for

Automated Power Line Routing’, CIRED 2025 - 28th Conference and Exhibition on Electricity Distribution, 16 - 19 June 2025, Geneva, Switzerland

class pyorps.raster.rasterizer.Affine(a, b, c, d, e, f, g=0.0, h=0.0, i=1.0)[source]

Bases: Affine

Two dimensional affine transform for 2D linear mapping.

Parameters:
  • a (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • b (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • c (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • d (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • e (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • f (float) –

    Coefficients of an augmented affine transformation matrix

    x’ | | a b c | | x |
    y’ | = | d e f | | y |
    1 | | 0 0 1 | | 1 |

    a, b, and c are the elements of the first row of the matrix. d, e, and f are the elements of the second row.

  • g (float)

  • h (float)

  • i (float)

a, b, c, d, e, f, g, h, i

The coefficients of the 3x3 augmented affine transformation matrix

x’ | | a b c | | x |
y’ | = | d e f | | y |
1 | | g h i | | 1 |

g, h, and i are always 0, 0, and 1.

Type:

float

The Affine package is derived from Casey Duncan's Planar package.
See the copyright statement below.  Parallel lines are preserved by
these transforms. Affine transforms can perform any combination of
translations, scales/flips, shears, and rotations.  Class methods
are provided to conveniently compose transforms from these
operations.
Internally the transform is stored as a 3x3 transformation matrix.
The transform may be constructed directly by specifying the first
two rows of matrix values as 6 floats. Since the matrix is an affine
transform, the last row is always ``(0, 0, 1)``.
N.B.
Type:

multiplication of a transform and an (x, y) vector *always*

returns the column vector that is the matrix multiplication product
of the transform and (x, y) as a column vector, no matter which is
on the left or right side. This is obviously not the case for
matrices and vectors in general, but provides a convenience for
users of this class.
__getnewargs__()[source]

Pickle protocol support

Notes

Normal unpickling creates a situation where __new__ receives all 9 elements rather than the 6 that are required for the constructor. This method ensures that only the 6 are provided.

__invert__()[source]

Return the inverse transform.

Raises:

:except:`TransformNotInvertible` if the transform is degenerate.

__mul__(other)[source]

Multiplication

Apply the transform using matrix multiplication, creating a resulting object of the same type. A transform may be applied to another transform, a vector, vector array, or shape.

Parameters:

other (Affine, Vec2, Vec2Array, Shape) – The object to transform.

Return type:

Same as other

static __new__(cls, a, b, c, d, e, f, g=0.0, h=0.0, i=1.0)[source]

Create a new object

Parameters:
  • a (float) – Elements of an augmented affine transformation matrix.

  • b (float) – Elements of an augmented affine transformation matrix.

  • c (float) – Elements of an augmented affine transformation matrix.

  • d (float) – Elements of an augmented affine transformation matrix.

  • e (float) – Elements of an augmented affine transformation matrix.

  • f (float) – Elements of an augmented affine transformation matrix.

  • g (float)

  • h (float)

  • i (float)

__repr__()[source]

Precise string representation.

Return type:

str

__rmul__(other)[source]

Right hand multiplication

Deprecated since version 2.3.0: Right multiplication will be prohibited in version 3.0. This method will raise AffineError.

Notes

We should not be called if other is an affine instance This is just a guarantee, since we would potentially return the wrong answer in that case.

__str__()[source]

Concise string representation.

Return type:

str

a

Alias for field number 0

almost_equals(other, precision=1e-05)[source]

Compare transforms for approximate equality.

Parameters:
  • other (Affine) – Transform being compared.

  • precision (float)

Return type:

bool

Returns:

True if absolute difference between each element of each respective transform matrix < self.precision.

b

Alias for field number 1

c

Alias for field number 2

property column_vectors

The values of the transform as three 2D column vectors

count(value, /)

Return number of occurrences of value.

d

Alias for field number 3

property determinant

The determinant of the transform matrix.

This value is equal to the area scaling factor when the transform is applied to a shape.

e

Alias for field number 4

property eccentricity: float

The eccentricity of the affine transformation.

This value represents the eccentricity of an ellipse under this affine transformation.

Raises NotImplementedError for improper transformations.

f

Alias for field number 5

classmethod from_gdal(c, a, b, f, d, e)[source]

Use same coefficient order as GDAL’s GetGeoTransform().

Parameters:
Return type:

Affine

g

Alias for field number 6

h

Alias for field number 7

i

Alias for field number 8

classmethod identity()[source]

Return the identity transform.

Return type:

Affine

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

property is_conformal: bool

True if the transform is conformal.

i.e., if angles between points are preserved after applying the transform, within rounding limits. This implies that the transform has no effective shear.

property is_degenerate

True if this transform is degenerate.

Which means that it will collapse a shape to an effective area of zero. Degenerate transforms cannot be inverted.

property is_identity: bool

True if this transform equals the identity matrix, within rounding limits.

property is_orthonormal: bool

True if the transform is orthonormal.

Which means that the transform represents a rigid motion, which has no effective scaling or shear. Mathematically, this means that the axis vectors of the transform matrix are perpendicular and unit-length. Applying an orthonormal transform to a shape always results in a congruent shape.

property is_proper

True if this transform is proper.

Which means that it does not include reflection.

property is_rectilinear: bool

True if the transform is rectilinear.

i.e., whether a shape would remain axis-aligned, within rounding limits, after applying the transform.

itransform(seq)[source]

Transform a sequence of points or vectors in place.

Parameters:

seq – Mutable sequence of Vec2 to be transformed.

Return type:

None

Returns:

None, the input sequence is mutated in place.

classmethod permutation(*scaling)[source]

Create the permutation transform

For 2x2 matrices, there is only one permutation matrix that is not the identity.

Return type:

Affine

precision = 1e-05
classmethod rotation(angle, pivot=None)[source]

Create a rotation transform at the specified angle.

A pivot point other than the coordinate system origin may be optionally specified.

Parameters:
  • angle (float) – Rotation angle in degrees, counter-clockwise about the pivot point.

  • pivot (sequence) – Point to rotate about, if omitted the rotation is about the origin.

Return type:

Affine

property rotation_angle: float

The rotation angle in degrees of the affine transformation.

This is the rotation angle in degrees of the affine transformation, assuming it is in the form M = R S, where R is a rotation and S is a scaling.

Raises UndefinedRotationError for improper and degenerate transformations.

classmethod scale(*scaling)[source]

Create a scaling transform from a scalar or vector.

Parameters:

scaling (float or sequence) – The scaling factor. A scalar value will scale in both dimensions equally. A vector scaling value scales the dimensions independently.

Return type:

Affine

classmethod shear(x_angle=0, y_angle=0)[source]

Create a shear transform along one or both axes.

Parameters:
  • x_angle (float) – Shear angle in degrees parallel to the x-axis.

  • y_angle (float) – Shear angle in degrees parallel to the y-axis.

Return type:

Affine

to_gdal()[source]

Return same coefficient order as GDAL’s SetGeoTransform().

Return type:

tuple

to_shapely()[source]

Return an affine transformation matrix compatible with shapely

Shapely’s affinity module expects an affine transformation matrix in (a,b,d,e,xoff,yoff) order.

Return type:

tuple

classmethod translation(xoff, yoff)[source]

Create a translation transform from an offset vector.

Parameters:
  • xoff (float) – Translation x offset.

  • yoff (float) – Translation y offset.

Return type:

Affine

property xoff: float

Alias for ‘c’

property yoff: float

Alias for ‘f’

class pyorps.raster.rasterizer.Any(*args, **kwargs)[source]

Bases: object

Special type indicating an unconstrained type.

  • Any is compatible with every type.

  • Any assumed to have all methods.

  • All values assumed to be instances of Any.

Note that all the above statements are true from the point of view of static type checkers. At runtime, Any should not be used with instance checks.

class pyorps.raster.rasterizer.CostAssumptions(source=None)[source]

Bases: object

A class for handling cost assumptions for rasterization.

This class handles: - Loading cost assumptions from files (CSV, Excel, JSON) or generating of cost assumptions from a dictionary or a GeoDataFrame. - Mapping costs to features in a GeoDataFrame - Managing hierarchical cost structures

Parameters:

source (str | dict | None)

__init__(source=None)[source]

Initialize the CostAssumptions object.

Parameters:

source (Union[str, dict, None]) –

  1. Path to a cost assumptions file

  2. A dictionary of cost values

apply_to_geodataframe(gdf, main_feature=None, side_features=None)[source]

Apply cost assumptions to a GeoDataFrame.

Parameters:
  • gdf (GeoDataFrame) – GeoDataFrame to apply costs to

  • main_feature (Optional[str]) – Main feature column name

  • side_features (Optional[list[str]]) – list of side feature column names or single side feature name

Returns:

GeoDataFrame with ‘cost’ column added

convert_df_to_cost_dict(df)[source]

Convert a DataFrame to a nested dictionary for cost assumptions.

Parameters:

df (DataFrame) – DataFrame containing cost assumptions with hierarchical structure

Return type:

dict

Returns:

dictionary of cost assumptions with nested structure based on DataFrame columns

Uses one numeric column for costs, and all other columns as a hierarchical index: - The first column is the ‘main_feature’ - All additional columns are ‘side_features’

cost_dict_to_df(cost_dict)[source]

Convert cost assumptions dictionary to DataFrame.

Parameters:

cost_dict (dict) – Dictionary of cost assumptions

Return type:

DataFrame

Returns:

DataFrame representation of cost assumptions

load(source)[source]

Load cost assumptions from a file or dictionary.

Parameters:

source (Union[str, dict]) – Path to a file or a dictionary containing cost assumptions

Return type:

dict

Returns:

dictionary of cost assumptions

to_csv(filepath, separator=';', decimal='.', encoding='ISO-8859-1')[source]

Save the cost assumptions to a CSV file.

Parameters:
  • filepath (str) – Path where to save the CSV file

  • separator (str) – Column separator character (default is ‘;’)

  • decimal (str) – Decimal separator character (default is ‘.’)

  • encoding (str) – The encoding of the file (default is ‘ISO-8859-1’)

Return type:

None

to_excel(filepath, sheet_name='CostAssumptions', index=False)[source]

Save the cost assumptions to an Excel file.

Parameters:
  • filepath (str) – Path where to save the Excel file

  • sheet_name (str) – Name of the worksheet (default is ‘CostAssumptions’)

  • index (bool) – Whether to write row indices (default is False)

Return type:

None

to_json(filepath, indent=2, encoding='ISO-8859-1')[source]

Save the cost assumptions to a JSON file.

Parameters:
  • filepath (str) – Path where to save the JSON file

  • indent (int) – Number of spaces for indentation (default is 2)

  • encoding (str) – The encoding of the file (default is ‘ISO-8859-1’)

Return type:

None

class pyorps.raster.rasterizer.GeoDataFrame(data=None, *args, geometry=None, crs=None, **kwargs)[source]

Bases: GeoPandasBase, DataFrame

A GeoDataFrame object is a pandas.DataFrame that has one or more columns containing geometry.

In addition to the standard DataFrame constructor arguments, GeoDataFrame also accepts the following keyword arguments:

Parameters:
  • crs (value (optional)) – Coordinate Reference System of the geometry objects. Can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • geometry (str or array-like (optional)) –

    Value to use as the active geometry column. If str, treated as column name to use. If array-like, it will be added as new column named ‘geometry’ on the GeoDataFrame and set as the active geometry column.

    Note that if geometry is a (Geo)Series with a name, the name will not be used, a column named “geometry” will still be added. To preserve the name, you can use rename_geometry() to update the geometry column name.

Examples

Constructing GeoDataFrame from a dictionary.

>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)

Notice that the inferred dtype of ‘geometry’ columns is geometry.

>>> gdf.dtypes
col1          object
geometry    geometry
dtype: object

Constructing GeoDataFrame from a pandas DataFrame with a column of WKT geometries:

>>> import pandas as pd
>>> d = {'col1': ['name1', 'name2'], 'wkt': ['POINT (1 2)', 'POINT (2 1)']}
>>> df = pd.DataFrame(d)
>>> gs = geopandas.GeoSeries.from_wkt(df['wkt'])
>>> gdf = geopandas.GeoDataFrame(df, geometry=gs, crs="EPSG:4326")
>>> gdf
    col1          wkt     geometry
0  name1  POINT (1 2)  POINT (1 2)
1  name2  POINT (2 1)  POINT (2 1)

See also

GeoSeries

Series object designed to store shapely geometry objects

property T: DataFrame

The transpose of the DataFrame.

Returns:

The transposed DataFrame.

Return type:

DataFrame

See also

DataFrame.transpose

Transpose index and columns.

Examples

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
   col1  col2
0     1     3
1     2     4
>>> df.T
      0  1
col1  1  2
col2  3  4
__add__(other)

Get Addition of DataFrame and other, column-wise.

Equivalent to DataFrame.add(other).

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Object to be added to the DataFrame.

Returns:

The result of adding other to DataFrame.

Return type:

DataFrame

See also

DataFrame.add

Add a DataFrame and another object, with option for index- or column-oriented addition.

Examples

>>> df = pd.DataFrame({'height': [1.5, 2.6], 'weight': [500, 800]},
...                   index=['elk', 'moose'])
>>> df
       height  weight
elk       1.5     500
moose     2.6     800

Adding a scalar affects all rows and columns.

>>> df[['height', 'weight']] + 1.5
       height  weight
elk       3.0   501.5
moose     4.1   801.5

Each element of a list is added to a column of the DataFrame, in order.

>>> df[['height', 'weight']] + [0.5, 1.5]
       height  weight
elk       2.0   501.5
moose     3.1   801.5

Keys of a dictionary are aligned to the DataFrame, based on column names; each value in the dictionary is added to the corresponding column.

>>> df[['height', 'weight']] + {'height': 0.5, 'weight': 1.5}
       height  weight
elk       2.0   501.5
moose     3.1   801.5

When other is a Series, the index of other is aligned with the columns of the DataFrame.

>>> s1 = pd.Series([0.5, 1.5], index=['weight', 'height'])
>>> df[['height', 'weight']] + s1
       height  weight
elk       3.0   500.5
moose     4.1   800.5

Even when the index of other is the same as the index of the DataFrame, the Series will not be reoriented. If index-wise alignment is desired, DataFrame.add() should be used with axis=’index’.

>>> s2 = pd.Series([0.5, 1.5], index=['elk', 'moose'])
>>> df[['height', 'weight']] + s2
       elk  height  moose  weight
elk    NaN     NaN    NaN     NaN
moose  NaN     NaN    NaN     NaN
>>> df[['height', 'weight']].add(s2, axis='index')
       height  weight
elk       2.0   500.5
moose     4.1   801.5

When other is a DataFrame, both columns names and the index are aligned.

>>> other = pd.DataFrame({'height': [0.2, 0.4, 0.6]},
...                      index=['elk', 'moose', 'deer'])
>>> df[['height', 'weight']] + other
       height  weight
deer      NaN     NaN
elk       1.7     NaN
moose     3.0     NaN
__arrow_c_stream__(requested_schema=None)

Export the pandas DataFrame as an Arrow C stream PyCapsule.

This relies on pyarrow to convert the pandas DataFrame to the Arrow format (and follows the default behaviour of pyarrow.Table.from_pandas in its handling of the index, i.e. store the index as a column except for RangeIndex). This conversion is not necessarily zero-copy.

Parameters:

requested_schema (PyCapsule, default None) – The schema to which the dataframe should be casted, passed as a PyCapsule containing a C ArrowSchema representation of the requested schema.

Return type:

PyCapsule

__contains__(key)

True if the key is in the info axis

Return type:

bool

__dataframe__(nan_as_null=False, allow_copy=True)

Return the dataframe interchange object implementing the interchange protocol.

Parameters:
  • nan_as_null (bool, default False) – nan_as_null is DEPRECATED and has no effect. Please avoid using it; it will be removed in a future release.

  • allow_copy (bool, default True) – Whether to allow memory copying when exporting. If set to False it would cause non-zero-copy exports to fail.

Returns:

The object which consuming library can use to ingress the dataframe.

Return type:

DataFrame interchange object

Notes

Details on the interchange protocol: https://data-apis.org/dataframe-protocol/latest/index.html

Examples

>>> df_not_necessarily_pandas = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
>>> interchange_object = df_not_necessarily_pandas.__dataframe__()
>>> interchange_object.column_names()
Index(['A', 'B'], dtype='object')
>>> df_pandas = (pd.api.interchange.from_dataframe
...              (interchange_object.select_columns_by_name(['A'])))
>>> df_pandas
     A
0    1
1    2

These methods (column_names, select_columns_by_name) should work for any dataframe library which implements the interchange protocol.

__dataframe_consortium_standard__(*, api_version=None)

Provide entry point to the Consortium DataFrame Standard API.

This is developed and maintained outside of pandas. Please report any issues to https://github.com/data-apis/dataframe-api-compat.

Return type:

Any

Parameters:

api_version (str | None)

__deepcopy__(memo=None)
Parameters:
  • memo – Standard signature. Unused

  • None (default) – Standard signature. Unused

Return type:

None

__delitem__(key)

Delete item

Return type:

None

__dir__()

Provide method name lookup and completion.

Return type:

list[str]

Notes

Only provide ‘public’ methods.

__finalize__(other, method=None, **kwargs)[source]

Propagate metadata from other to self.

Return type:

GeoDataFrame | GeoSeries

Parameters:

method (str | None)

__getattr__(name)

After regular attribute access, try looking up the name This allows simpler access to columns for interactive use.

Parameters:

name (str)

__getitem__(key)[source]

If the result is a column containing only ‘geometry’, return a GeoSeries. If it’s a DataFrame with any columns of GeometryDtype, return a GeoDataFrame.

__init__(data=None, *args, geometry=None, crs=None, **kwargs)[source]
Parameters:
  • geometry (Any | None)

  • crs (Any | None)

__iter__()

Iterate over info axis.

Returns:

Info axis as iterator.

Return type:

iterator

Examples

>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
>>> for x in df:
...     print(x)
A
B
__len__()

Returns length of info axis, but here we use the index.

Return type:

int

__matmul__(other)

Matrix multiplication using binary @ operator.

Return type:

DataFrame | Series

Parameters:

other (ExtensionArray | ndarray | Index | Series | DataFrame)

__repr__()

Return a string representation for a particular DataFrame.

Return type:

str

__rmatmul__(other)

Matrix multiplication using binary @ operator.

Return type:

DataFrame

__setitem__(key, value)[source]

Overridden to preserve CRS of GeometryArray.

Important for cases like df[‘geometry’] = [geom… for geom in df.geometry]

__sizeof__()

Generates the total memory usage for an object that returns either a value or Series of values

Return type:

int

abs()

Return a Series/DataFrame with absolute numeric value of each element.

This function only applies to elements that are all numeric.

Returns:

Series/DataFrame containing the absolute value of each element.

Return type:

abs

See also

numpy.absolute

Calculate the absolute value element-wise.

Notes

For complex inputs, 1.2 + 1j, the absolute value is \(\sqrt{ a^2 + b^2 }\).

Examples

Absolute numeric values in a Series.

>>> s = pd.Series([-1.10, 2, -3.33, 4])
>>> s.abs()
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64

Absolute numeric values in a Series with complex numbers.

>>> s = pd.Series([1.2 + 1j])
>>> s.abs()
0    1.56205
dtype: float64

Absolute numeric values in a Series with a Timedelta element.

>>> s = pd.Series([pd.Timedelta('1 days')])
>>> s.abs()
0   1 days
dtype: timedelta64[ns]

Select rows with data closest to certain value using argsort (from StackOverflow).

>>> df = pd.DataFrame({
...     'a': [4, 5, 6, 7],
...     'b': [10, 20, 30, 40],
...     'c': [100, 50, -30, -50]
... })
>>> df
     a    b    c
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50
>>> df.loc[(df.c - 43).abs().argsort()]
     a    b    c
1    5   20   50
0    4   10  100
2    6   30  -30
3    7   40  -50
property active_geometry_name: Any

Return the name of the active geometry column.

Returns a name if a GeoDataFrame has an active geometry column set, otherwise returns None. The return type is usually a string, but may be an integer, tuple or other hashable, depending on the contents of the dataframe columns.

You can also access the active geometry column using the .geometry property. You can set a GeoSeries to be an active geometry using the set_geometry() method.

Returns:

name of an active geometry column or None

Return type:

str or other index label supported by pandas

See also

GeoDataFrame.set_geometry

set the active geometry

add(other, axis='columns', level=None, fill_value=None)

Get Addition of dataframe and other, element-wise (binary operator add).

Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
add_prefix(prefix, axis=None)

Prefix labels with string prefix.

For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.

Parameters:
  • prefix (str) – The string to add before each label.

  • axis ({0 or 'index', 1 or 'columns', None}, default None) –

    Axis to add prefix on

    Added in version 2.0.0.

Returns:

New Series or DataFrame with updated labels.

Return type:

Series or DataFrame

See also

Series.add_suffix

Suffix row labels with string suffix.

DataFrame.add_suffix

Suffix column labels with string suffix.

Examples

>>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64
>>> s.add_prefix('item_')
item_0    1
item_1    2
item_2    3
item_3    4
dtype: int64
>>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df
   A  B
0  1  3
1  2  4
2  3  5
3  4  6
>>> df.add_prefix('col_')
     col_A  col_B
0       1       3
1       2       4
2       3       5
3       4       6
add_suffix(suffix, axis=None)

Suffix labels with string suffix.

For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.

Parameters:
  • suffix (str) – The string to add after each label.

  • axis ({0 or 'index', 1 or 'columns', None}, default None) –

    Axis to add suffix on

    Added in version 2.0.0.

Returns:

New Series or DataFrame with updated labels.

Return type:

Series or DataFrame

See also

Series.add_prefix

Prefix row labels with string prefix.

DataFrame.add_prefix

Prefix column labels with string prefix.

Examples

>>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64
>>> s.add_suffix('_item')
0_item    1
1_item    2
2_item    3
3_item    4
dtype: int64
>>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df
   A  B
0  1  3
1  2  4
2  3  5
3  4  6
>>> df.add_suffix('_col')
     A_col  B_col
0       1       3
1       2       4
2       3       5
3       4       6
affine_transform(matrix)

Return a GeoSeries with translated geometries.

See http://shapely.readthedocs.io/en/stable/manual.html#shapely.affinity.affine_transform for details.

Parameters:

matrix (List or tuple) –

6 or 12 items for 2D or 3D transformations respectively.

For 2D affine transformations, the 6 parameter matrix is [a, b, d, e, xoff, yoff]

For 3D affine transformations, the 12 parameter matrix is [a, b, c, d, e, f, g, h, i, xoff, yoff, zoff]

Examples

>>> from shapely.geometry import Point, LineString, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Point(1, 1),
...         LineString([(1, -1), (1, 0)]),
...         Polygon([(3, -1), (4, 0), (3, 1)]),
...     ]
... )
>>> s
0                         POINT (1 1)
1              LINESTRING (1 -1, 1 0)
2    POLYGON ((3 -1, 4 0, 3 1, 3 -1))
dtype: geometry
>>> s.affine_transform([2, 3, 2, 4, 5, 2])
0                          POINT (10 8)
1                 LINESTRING (4 0, 7 4)
2    POLYGON ((8 4, 13 10, 14 12, 8 4))
dtype: geometry
agg(func=None, axis=0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

Parameters:
  • func (function, str, list or dict) –

    Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

    Accepted combinations are:

    • function

    • string function name

    • list of functions and/or function names, e.g. [np.sum, 'mean']

    • dict of axis labels -> functions, function names or list of such.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

  • *args – Positional arguments to pass to func.

  • **kwargs – Keyword arguments to pass to func.

Returns:

The return can be:

  • scalar : when Series.agg is called with single function

  • Series : when DataFrame.agg is called with a single function

  • DataFrame : when DataFrame.agg is called with several functions

Return type:

scalar, Series or DataFrame

See also

DataFrame.apply

Perform any type of operations.

DataFrame.transform

Perform transformation type operations.

pandas.DataFrame.groupby

Perform operations over groups.

pandas.DataFrame.resample

Perform operations over resampled bins.

pandas.DataFrame.rolling

Perform operations over rolling window.

pandas.DataFrame.expanding

Perform operations over expanding window.

pandas.core.window.ewm.ExponentialMovingWindow

Perform operation over exponential weighted window.

Notes

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

>>> df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

Aggregate different functions over the columns and rename the index of the resulting DataFrame.

>>> df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean'))
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0

Aggregate over the columns.

>>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64
aggregate(func=None, axis=0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

Parameters:
  • func (function, str, list or dict) –

    Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

    Accepted combinations are:

    • function

    • string function name

    • list of functions and/or function names, e.g. [np.sum, 'mean']

    • dict of axis labels -> functions, function names or list of such.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

  • *args – Positional arguments to pass to func.

  • **kwargs – Keyword arguments to pass to func.

Returns:

The return can be:

  • scalar : when Series.agg is called with single function

  • Series : when DataFrame.agg is called with a single function

  • DataFrame : when DataFrame.agg is called with several functions

Return type:

scalar, Series or DataFrame

See also

DataFrame.apply

Perform any type of operations.

DataFrame.transform

Perform transformation type operations.

pandas.DataFrame.groupby

Perform operations over groups.

pandas.DataFrame.resample

Perform operations over resampled bins.

pandas.DataFrame.rolling

Perform operations over rolling window.

pandas.DataFrame.expanding

Perform operations over expanding window.

pandas.core.window.ewm.ExponentialMovingWindow

Perform operation over exponential weighted window.

Notes

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

>>> df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

Aggregate different functions over the columns and rename the index of the resulting DataFrame.

>>> df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean'))
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0

Aggregate over the columns.

>>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64
align(other, join='outer', axis=None, level=None, copy=None, fill_value=None, method=<no_default>, limit=<no_default>, fill_axis=<no_default>, broadcast_axis=<no_default>)

Align two objects on their axes with the specified join method.

Join method is specified for each axis Index.

Parameters:
  • other (DataFrame or Series)

  • join ({'outer', 'inner', 'left', 'right'}, default 'outer') –

    Type of alignment to be performed.

    • left: use only keys from left frame, preserve key order.

    • right: use only keys from right frame, preserve key order.

    • outer: use union of keys from both frames, sort keys lexicographically.

    • inner: use intersection of keys from both frames, preserve the order of the left keys.

  • axis (allowed axis of the other object, default None) – Align on index (0), columns (1), or both (None).

  • level (int or level name, default None) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • copy (bool, default True) –

    Always returns new objects. If copy=False and no reindexing is required then original objects are returned.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

  • fill_value (scalar, default np.nan) – Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

  • method ({'backfill', 'bfill', 'pad', 'ffill', None}, default None) –

    Method to use for filling holes in reindexed Series:

    • pad / ffill: propagate last valid observation forward to next valid.

    • backfill / bfill: use NEXT valid observation to fill gap.

    Deprecated since version 2.1.

  • limit (int, default None) –

    If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

    Deprecated since version 2.1.

  • fill_axis ({0 or 'index'} for Series, {0 or 'index', 1 or 'columns'} for DataFrame, default 0) –

    Filling axis, method and limit.

    Deprecated since version 2.1.

  • broadcast_axis ({0 or 'index'} for Series, {0 or 'index', 1 or 'columns'} for DataFrame, default None) –

    Broadcast values along this axis, if aligning two objects of different dimensions.

    Deprecated since version 2.1.

Returns:

Aligned objects.

Return type:

tuple of (Series/DataFrame, type of other)

Examples

>>> df = pd.DataFrame(
...     [[1, 2, 3, 4], [6, 7, 8, 9]], columns=["D", "B", "E", "A"], index=[1, 2]
... )
>>> other = pd.DataFrame(
...     [[10, 20, 30, 40], [60, 70, 80, 90], [600, 700, 800, 900]],
...     columns=["A", "B", "C", "D"],
...     index=[2, 3, 4],
... )
>>> df
   D  B  E  A
1  1  2  3  4
2  6  7  8  9
>>> other
    A    B    C    D
2   10   20   30   40
3   60   70   80   90
4  600  700  800  900

Align on columns:

>>> left, right = df.align(other, join="outer", axis=1)
>>> left
   A  B   C  D  E
1  4  2 NaN  1  3
2  9  7 NaN  6  8
>>> right
    A    B    C    D   E
2   10   20   30   40 NaN
3   60   70   80   90 NaN
4  600  700  800  900 NaN

We can also align on the index:

>>> left, right = df.align(other, join="outer", axis=0)
>>> left
    D    B    E    A
1  1.0  2.0  3.0  4.0
2  6.0  7.0  8.0  9.0
3  NaN  NaN  NaN  NaN
4  NaN  NaN  NaN  NaN
>>> right
    A      B      C      D
1    NaN    NaN    NaN    NaN
2   10.0   20.0   30.0   40.0
3   60.0   70.0   80.0   90.0
4  600.0  700.0  800.0  900.0

Finally, the default axis=None will align on both index and columns:

>>> left, right = df.align(other, join="outer", axis=None)
>>> left
     A    B   C    D    E
1  4.0  2.0 NaN  1.0  3.0
2  9.0  7.0 NaN  6.0  8.0
3  NaN  NaN NaN  NaN  NaN
4  NaN  NaN NaN  NaN  NaN
>>> right
       A      B      C      D   E
1    NaN    NaN    NaN    NaN NaN
2   10.0   20.0   30.0   40.0 NaN
3   60.0   70.0   80.0   90.0 NaN
4  600.0  700.0  800.0  900.0 NaN
all(axis=0, bool_only=False, skipna=True, **kwargs)

Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).

Parameters:
  • axis ({0 or 'index', 1 or 'columns', None}, default 0) –

    Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.

    • 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.

    • 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.

    • None : reduce all axes, return a scalar.

  • bool_only (bool, default False) – Include only boolean columns. Not implemented for Series.

  • skipna (bool, default True) – Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

  • **kwargs (any, default None) – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

If level is specified, then, DataFrame is returned; otherwise, Series is returned.

Return type:

Series or DataFrame

See also

Series.all

Return True if all elements are True.

DataFrame.any

Return True if one (or more) elements are True.

Examples

Series

>>> pd.Series([True, True]).all()
True
>>> pd.Series([True, False]).all()
False
>>> pd.Series([], dtype="float64").all()
True
>>> pd.Series([np.nan]).all()
True
>>> pd.Series([np.nan]).all(skipna=False)
True

DataFrames

Create a dataframe from a dictionary.

>>> df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
>>> df
   col1   col2
0  True   True
1  True  False

Default behaviour checks if values in each column all return True.

>>> df.all()
col1     True
col2    False
dtype: bool

Specify axis='columns' to check if values in each row all return True.

>>> df.all(axis='columns')
0     True
1    False
dtype: bool

Or axis=None for whether every value is True.

>>> df.all(axis=None)
False
any(*, axis=0, bool_only=False, skipna=True, **kwargs)

Return whether any element is True, potentially over an axis.

Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).

Parameters:
  • axis ({0 or 'index', 1 or 'columns', None}, default 0) –

    Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.

    • 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.

    • 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.

    • None : reduce all axes, return a scalar.

  • bool_only (bool, default False) – Include only boolean columns. Not implemented for Series.

  • skipna (bool, default True) – Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

  • **kwargs (any, default None) – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

If level is specified, then, DataFrame is returned; otherwise, Series is returned.

Return type:

Series or DataFrame

See also

numpy.any

Numpy version of this method.

Series.any

Return whether any element is True.

Series.all

Return whether all elements are True.

DataFrame.any

Return whether any element is True over requested axis.

DataFrame.all

Return whether all elements are True over requested axis.

Examples

Series

For Series input, the output is a scalar indicating whether any element is True.

>>> pd.Series([False, False]).any()
False
>>> pd.Series([True, False]).any()
True
>>> pd.Series([], dtype="float64").any()
False
>>> pd.Series([np.nan]).any()
False
>>> pd.Series([np.nan]).any(skipna=False)
True

DataFrame

Whether each column contains at least one True element (the default).

>>> df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
>>> df
   A  B  C
0  1  0  0
1  2  2  0
>>> df.any()
A     True
B     True
C    False
dtype: bool

Aggregating over the columns.

>>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df
       A  B
0   True  1
1  False  2
>>> df.any(axis='columns')
0    True
1    True
dtype: bool
>>> df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
>>> df
       A  B
0   True  1
1  False  0
>>> df.any(axis='columns')
0    True
1    False
dtype: bool

Aggregating over the entire DataFrame with axis=None.

>>> df.any(axis=None)
True

any for an empty DataFrame is an empty Series.

>>> pd.DataFrame([]).any()
Series([], dtype: bool)
apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)[source]

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.

Parameters:
  • data (ndarray (structured or homogeneous), Iterable, dict, or DataFrame) –

    Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order. If a dict contains Series which have an index defined, it is aligned by its index. This alignment also occurs if data is a Series or a DataFrame itself. Alignment is done on Series/DataFrame inputs.

    If data is a list of dicts, column order follows insertion-order.

  • index (Index or array-like) – Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.

  • columns (Index or array-like) – Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.

  • dtype (dtype, default None) – Data type to force. Only a single dtype is allowed. If None, infer.

  • copy (bool or None, default None) –

    Copy data from inputs. For dict data, the default of None behaves like copy=True. For DataFrame or 2d ndarray input, the default of None behaves like copy=False. If data is a dict containing one or more Series (possibly of different dtypes), copy=False will ensure that these inputs are not copied.

    Changed in version 1.3.0.

  • raw (bool)

See also

DataFrame.from_records

Constructor from tuples, also record arrays.

DataFrame.from_dict

From dicts of Series, arrays, or dicts.

read_csv

Read a comma-separated values (csv) file into DataFrame.

read_table

Read general delimited file into DataFrame.

read_clipboard

Read text from clipboard into DataFrame.

Notes

Please reference the User Guide for more information.

Examples

Constructing DataFrame from a dictionary.

>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
   col1  col2
0     1     3
1     2     4

Notice that the inferred dtype is int64.

>>> df.dtypes
col1    int64
col2    int64
dtype: object

To enforce a single dtype:

>>> df = pd.DataFrame(data=d, dtype=np.int8)
>>> df.dtypes
col1    int8
col2    int8
dtype: object

Constructing DataFrame from a dictionary including Series:

>>> d = {'col1': [0, 1, 2, 3], 'col2': pd.Series([2, 3], index=[2, 3])}
>>> pd.DataFrame(data=d, index=[0, 1, 2, 3])
   col1  col2
0     0   NaN
1     1   NaN
2     2   2.0
3     3   3.0

Constructing DataFrame from numpy ndarray:

>>> df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
...                    columns=['a', 'b', 'c'])
>>> df2
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

Constructing DataFrame from a numpy ndarray that has labeled columns:

>>> data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
...                 dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
>>> df3 = pd.DataFrame(data, columns=['c', 'a'])
...
>>> df3
   c  a
0  3  1
1  6  4
2  9  7

Constructing DataFrame from dataclass:

>>> from dataclasses import make_dataclass
>>> Point = make_dataclass("Point", [("x", int), ("y", int)])
>>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
   x  y
0  0  0
1  0  3
2  2  3

Constructing DataFrame from Series/DataFrame:

>>> ser = pd.Series([1, 2, 3], index=["a", "b", "c"])
>>> df = pd.DataFrame(data=ser, index=["a", "c"])
>>> df
   0
a  1
c  3
>>> df1 = pd.DataFrame([1, 2, 3], index=["a", "b", "c"], columns=["x"])
>>> df2 = pd.DataFrame(data=df1, index=["a", "c"])
>>> df2
   x
a  1
c  3
applymap(func, na_action=None, **kwargs)

Apply a function to a Dataframe elementwise.

Deprecated since version 2.1.0: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

Parameters:
  • func (callable) – Python function, returns a single value from a single value.

  • na_action ({None, 'ignore'}, default None) – If ‘ignore’, propagate NaN values, without passing them to func.

  • **kwargs – Additional keyword arguments to pass as keywords arguments to func.

Returns:

Transformed DataFrame.

Return type:

DataFrame

See also

DataFrame.apply

Apply a function along input axis of DataFrame.

DataFrame.map

Apply a function along input axis of DataFrame.

DataFrame.replace

Replace values given in to_replace with value.

Examples

>>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df
       0      1
0  1.000  2.120
1  3.356  4.567
>>> df.map(lambda x: len(str(x)))
   0  1
0  3  4
1  5  5
property area

Return a Series containing the area of each geometry in the GeoSeries expressed in the units of the CRS.

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         Polygon([(10, 0), (10, 5), (0, 0)]),
...         Polygon([(0, 0), (2, 2), (2, 0)]),
...         LineString([(0, 0), (1, 1), (0, 1)]),
...         Point(0, 1)
...     ]
... )
>>> s
0       POLYGON ((0 0, 1 1, 0 1, 0 0))
1    POLYGON ((10 0, 10 5, 0 0, 10 0))
2       POLYGON ((0 0, 2 2, 2 0, 0 0))
3           LINESTRING (0 0, 1 1, 0 1)
4                          POINT (0 1)
dtype: geometry
>>> s.area
0     0.5
1    25.0
2     2.0
3     0.0
4     0.0
dtype: float64

See also

GeoSeries.length

measure length

Notes

Area may be invalid for a geographic CRS using degrees as units; use GeoSeries.to_crs() to project geometries to a planar CRS before using this function.

Every operation in GeoPandas is planar, i.e. the potential third dimension is not taken into account.

asfreq(freq, method=None, how=None, normalize=False, fill_value=None)

Convert time series to specified frequency.

Returns the original data conformed to a new index with the specified frequency.

If the index of this Series/DataFrame is a PeriodIndex, the new index is the result of transforming the original index with PeriodIndex.asfreq (so the original index will map one-to-one to the new index).

Otherwise, the new index will be equivalent to pd.date_range(start, end, freq=freq) where start and end are, respectively, the first and last entries in the original index (see pandas.date_range()). The values corresponding to any timesteps in the new index which were not present in the original index will be null (NaN), unless a method for filling such unknowns is provided (see the method parameter below).

The resample() method is more appropriate if an operation on each group of timesteps (such as an aggregate) is necessary to represent the data at the new frequency.

Parameters:
  • freq (DateOffset or str) – Frequency DateOffset or string.

  • method ({'backfill'/'bfill', 'pad'/'ffill'}, default None) –

    Method to use for filling holes in reindexed Series (note this does not fill NaNs that already were present):

    • ’pad’ / ‘ffill’: propagate last valid observation forward to next valid

    • ’backfill’ / ‘bfill’: use NEXT valid observation to fill.

  • how ({'start', 'end'}, default end) – For PeriodIndex only (see PeriodIndex.asfreq).

  • normalize (bool, default False) – Whether to reset output index to midnight.

  • fill_value (scalar, optional) – Value to use for missing values, applied during upsampling (note this does not fill NaNs that already were present).

Returns:

Series/DataFrame object reindexed to the specified frequency.

Return type:

Series/DataFrame

See also

reindex

Conform DataFrame to new index with optional filling logic.

Notes

To learn more about the frequency strings, please see this link.

Examples

Start by creating a series with 4 one minute timestamps.

>>> index = pd.date_range('1/1/2000', periods=4, freq='min')
>>> series = pd.Series([0.0, None, 2.0, 3.0], index=index)
>>> df = pd.DataFrame({'s': series})
>>> df
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:01:00    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:03:00    3.0

Upsample the series into 30 second bins.

>>> df.asfreq(freq='30s')
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    NaN
2000-01-01 00:03:00    3.0

Upsample again, providing a fill value.

>>> df.asfreq(freq='30s', fill_value=9.0)
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    9.0
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    9.0
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    9.0
2000-01-01 00:03:00    3.0

Upsample again, providing a method.

>>> df.asfreq(freq='30s', method='bfill')
                       s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    2.0
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    3.0
2000-01-01 00:03:00    3.0
asof(where, subset=None)

Return the last row(s) without any NaNs before where.

The last row (for each element in where, if list) without any NaN is taken. In case of a DataFrame, the last row without NaN considering only the subset of columns (if not None)

If there is no good value, NaN is returned for a Series or a Series of NaN values for a DataFrame

Parameters:
  • where (date or array-like of dates) – Date(s) before which the last row(s) are returned.

  • subset (str or array-like of str, default None) – For DataFrame, if not None, only use these columns to check for NaNs.

Returns:

The return can be:

  • scalar : when self is a Series and where is a scalar

  • Series: when self is a Series and where is an array-like, or when self is a DataFrame and where is a scalar

  • DataFrame : when self is a DataFrame and where is an array-like

Return type:

scalar, Series, or DataFrame

See also

merge_asof

Perform an asof merge. Similar to left join.

Notes

Dates are assumed to be sorted. Raises if this is not the case.

Examples

A Series and a scalar where.

>>> s = pd.Series([1, 2, np.nan, 4], index=[10, 20, 30, 40])
>>> s
10    1.0
20    2.0
30    NaN
40    4.0
dtype: float64
>>> s.asof(20)
2.0

For a sequence where, a Series is returned. The first value is NaN, because the first element of where is before the first index value.

>>> s.asof([5, 20])
5     NaN
20    2.0
dtype: float64

Missing values are not considered. The following is 2.0, not NaN, even though NaN is at the index location for 30.

>>> s.asof(30)
2.0

Take all columns into consideration

>>> df = pd.DataFrame({'a': [10., 20., 30., 40., 50.],
...                    'b': [None, None, None, None, 500]},
...                   index=pd.DatetimeIndex(['2018-02-27 09:01:00',
...                                           '2018-02-27 09:02:00',
...                                           '2018-02-27 09:03:00',
...                                           '2018-02-27 09:04:00',
...                                           '2018-02-27 09:05:00']))
>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
...                           '2018-02-27 09:04:30']))
                      a   b
2018-02-27 09:03:30 NaN NaN
2018-02-27 09:04:30 NaN NaN

Take a single column into consideration

>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
...                           '2018-02-27 09:04:30']),
...         subset=['a'])
                        a   b
2018-02-27 09:03:30  30.0 NaN
2018-02-27 09:04:30  40.0 NaN
assign(**kwargs)

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

Parameters:

**kwargs (dict of {str: callable or Series}) – The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.

Returns:

A new DataFrame with the new columns in addition to all the existing columns.

Return type:

DataFrame

Notes

Assigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.

Examples

>>> df = pd.DataFrame({'temp_c': [17.0, 25.0]},
...                   index=['Portland', 'Berkeley'])
>>> df
          temp_c
Portland    17.0
Berkeley    25.0

Where the value is a callable, evaluated on df:

>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

Alternatively, the same behavior can be achieved by directly referencing an existing Series or sequence:

>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

You can create multiple columns within the same assign where one of the columns depends on another one defined within the same assign:

>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
...           temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9)
          temp_c  temp_f  temp_k
Portland    17.0    62.6  290.15
Berkeley    25.0    77.0  298.15
astype(dtype, copy=None, errors='raise')

Cast a pandas object to a specified dtype dtype.

Parameters:
  • dtype (str, data type, Series or Mapping of column name -> data type) – Use a str, numpy.dtype, pandas.ExtensionDtype or Python type to cast entire pandas object to the same type. Alternatively, use a mapping, e.g. {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.

  • copy (bool, default True) –

    Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects).

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

  • errors ({'raise', 'ignore'}, default 'raise') –

    Control raising of exceptions on invalid data for provided dtype.

    • raise : allow exceptions to be raised

    • ignore : suppress exceptions. On error return original object.

Return type:

same type as caller

See also

to_datetime

Convert argument to datetime.

to_timedelta

Convert argument to timedelta.

to_numeric

Convert argument to a numeric type.

numpy.ndarray.astype

Cast a numpy array to a specified type.

Notes

Changed in version 2.0.0: Using astype to convert from timezone-naive dtype to timezone-aware dtype will raise an exception. Use Series.dt.tz_localize() instead.

Examples

Create a DataFrame:

>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1    int64
col2    int64
dtype: object

Cast all columns to int32:

>>> df.astype('int32').dtypes
col1    int32
col2    int32
dtype: object

Cast col1 to int32 using a dictionary:

>>> df.astype({'col1': 'int32'}).dtypes
col1    int32
col2    int64
dtype: object

Create a series:

>>> ser = pd.Series([1, 2], dtype='int32')
>>> ser
0    1
1    2
dtype: int32
>>> ser.astype('int64')
0    1
1    2
dtype: int64

Convert to categorical type:

>>> ser.astype('category')
0    1
1    2
dtype: category
Categories (2, int32): [1, 2]

Convert to ordered categorical type with custom ordering:

>>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(
...     categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Create a series of dates:

>>> ser_date = pd.Series(pd.date_range('20200101', periods=3))
>>> ser_date
0   2020-01-01
1   2020-01-02
2   2020-01-03
dtype: datetime64[ns]
property at: _AtIndexer

Access a single value for a row/column label pair.

Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series.

Raises:
  • KeyError – If getting a value and ‘label’ does not exist in a DataFrame or Series.

  • ValueError – If row/column label pair is not a tuple or if any label from the pair is not a scalar for DataFrame. If label is list-like (excluding NamedTuple) for Series.

See also

DataFrame.at

Access a single value for a row/column pair by label.

DataFrame.iat

Access a single value for a row/column pair by integer position.

DataFrame.loc

Access a group of rows and columns by label(s).

DataFrame.iloc

Access a group of rows and columns by integer position(s).

Series.at

Access a single value by label.

Series.iat

Access a single value by integer position.

Series.loc

Access a group of rows by label(s).

Series.iloc

Access a group of rows by integer position(s).

Notes

See Fast scalar value getting and setting for more details.

Examples

>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                   index=[4, 5, 6], columns=['A', 'B', 'C'])
>>> df
    A   B   C
4   0   2   3
5   0   4   1
6  10  20  30

Get value at specified row/column pair

>>> df.at[4, 'B']
2

Set value at specified row/column pair

>>> df.at[4, 'B'] = 10
>>> df.at[4, 'B']
10

Get value within a Series

>>> df.loc[5].at['B']
4
at_time(time, asof=False, axis=None)

Select values at particular time of day (e.g., 9:30AM).

Parameters:
  • time (datetime.time or str) – The values to select.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – For Series this parameter is unused and defaults to 0.

  • asof (bool)

Return type:

Series or DataFrame

Raises:

TypeError – If the index is not a DatetimeIndex

See also

between_time

Select values between particular times of the day.

first

Select initial periods of time series based on a date offset.

last

Select final periods of time series based on a date offset.

DatetimeIndex.indexer_at_time

Get just the index locations for values at particular time of the day.

Examples

>>> i = pd.date_range('2018-04-09', periods=4, freq='12h')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
                     A
2018-04-09 00:00:00  1
2018-04-09 12:00:00  2
2018-04-10 00:00:00  3
2018-04-10 12:00:00  4
>>> ts.at_time('12:00')
                     A
2018-04-09 12:00:00  2
2018-04-10 12:00:00  4
property attrs: dict[Hashable, Any]

Dictionary of global attributes of this dataset.

Warning

attrs is experimental and may change without warning.

See also

DataFrame.flags

Global flags applying to this object.

Notes

Many operations that create new datasets will copy attrs. Copies are always deep so that changing attrs will only affect the present dataset. pandas.concat copies attrs only if all input datasets have the same attrs.

Examples

For Series:

>>> ser = pd.Series([1, 2, 3])
>>> ser.attrs = {"A": [10, 20, 30]}
>>> ser.attrs
{'A': [10, 20, 30]}

For DataFrame:

>>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
>>> df.attrs = {"A": [10, 20, 30]}
>>> df.attrs
{'A': [10, 20, 30]}
property axes: list[Index]

Return a list representing the axes of the DataFrame.

It has the row axis labels and column axis labels as the only members. They are returned in that order.

Examples

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.axes
[RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'],
dtype='object')]
backfill(*, axis=None, inplace=False, limit=None, downcast=<no_default>)

Fill NA/NaN values by using the next valid observation to fill the gap.

Deprecated since version 2.0: Series/DataFrame.backfill is deprecated. Use Series/DataFrame.bfill instead.

Returns:

Object with missing values filled or None if inplace=True.

Return type:

Series/DataFrame or None

Parameters:
  • axis (None | Axis)

  • inplace (bool_t)

  • limit (None | int)

  • downcast (dict | None | lib.NoDefault)

Examples

Please see examples for DataFrame.bfill() or Series.bfill().

between_time(start_time, end_time, inclusive='both', axis=None)

Select values between particular times of the day (e.g., 9:00-9:30 AM).

By setting start_time to be later than end_time, you can get the times that are not between the two times.

Parameters:
  • start_time (datetime.time or str) – Initial time as a time filter limit.

  • end_time (datetime.time or str) – End time as a time filter limit.

  • inclusive ({"both", "neither", "left", "right"}, default "both") – Include boundaries; whether to set each bound as closed or open.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Determine range time on index or columns value. For Series this parameter is unused and defaults to 0.

Returns:

Data from the original object filtered to the specified dates range.

Return type:

Series or DataFrame

Raises:

TypeError – If the index is not a DatetimeIndex

See also

at_time

Select values at a particular time of the day.

first

Select initial periods of time series based on a date offset.

last

Select final periods of time series based on a date offset.

DatetimeIndex.indexer_between_time

Get just the index locations for values between particular times of the day.

Examples

>>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
                     A
2018-04-09 00:00:00  1
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3
2018-04-12 01:00:00  4
>>> ts.between_time('0:15', '0:45')
                     A
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3

You get the times that are not between two times by setting start_time later than end_time:

>>> ts.between_time('0:45', '0:15')
                     A
2018-04-09 00:00:00  1
2018-04-12 01:00:00  4
bfill(*, axis=None, inplace=False, limit=None, limit_area=None, downcast=<no_default>)

Fill NA/NaN values by using the next valid observation to fill the gap.

Parameters:
  • axis ({0 or 'index'} for Series, {0 or 'index', 1 or 'columns'} for DataFrame) – Axis along which to fill missing values. For Series this parameter is unused and defaults to 0.

  • inplace (bool, default False) – If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).

  • limit (int, default None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

  • limit_area ({`None`, 'inside', 'outside'}, default None) –

    If limit is specified, consecutive NaNs will be filled with this restriction.

    • None: No fill restriction.

    • ’inside’: Only fill NaNs surrounded by valid values (interpolate).

    • ’outside’: Only fill NaNs outside valid values (extrapolate).

    Added in version 2.2.0.

  • downcast (dict, default is None) –

    A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

    Deprecated since version 2.2.0.

Returns:

Object with missing values filled or None if inplace=True.

Return type:

Series/DataFrame or None

Examples

For Series:

>>> s = pd.Series([1, None, None, 2])
>>> s.bfill()
0    1.0
1    2.0
2    2.0
3    2.0
dtype: float64
>>> s.bfill(limit=1)
0    1.0
1    NaN
2    2.0
3    2.0
dtype: float64

With DataFrame:

>>> df = pd.DataFrame({'A': [1, None, None, 4], 'B': [None, 5, None, 7]})
>>> df
      A     B
0   1.0   NaN
1   NaN   5.0
2   NaN   NaN
3   4.0   7.0
>>> df.bfill()
      A     B
0   1.0   5.0
1   4.0   5.0
2   4.0   7.0
3   4.0   7.0
>>> df.bfill(limit=1)
      A     B
0   1.0   5.0
1   NaN   5.0
2   4.0   7.0
3   4.0   7.0
bool()

Return the bool of a single element Series or DataFrame.

Deprecated since version 2.1.0: bool is deprecated and will be removed in future version of pandas. For Series use pandas.Series.item.

This must be a boolean scalar value, either True or False. It will raise a ValueError if the Series or DataFrame does not have exactly 1 element, or that element is not boolean (integer values 0 and 1 will also raise an exception).

Returns:

The value in the Series or DataFrame.

Return type:

bool

See also

Series.astype

Change the data type of a Series, including to boolean.

DataFrame.astype

Change the data type of a DataFrame, including to boolean.

numpy.bool_

NumPy boolean data type, used by pandas for boolean values.

Examples

The method will only work for single element objects with a boolean value:

>>> pd.Series([True]).bool()
True
>>> pd.Series([False]).bool()
False
>>> pd.DataFrame({'col': [True]}).bool()
True
>>> pd.DataFrame({'col': [False]}).bool()
False

This is an alternative method and will only work for single element objects with a boolean value:

>>> pd.Series([True]).item()
True
>>> pd.Series([False]).item()
False
property boundary

Return a GeoSeries of lower dimensional objects representing each geometry’s set-theoretic boundary.

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
>>> s.boundary
0    LINESTRING (0 0, 1 1, 0 1, 0 0)
1          MULTIPOINT ((0 0), (1 0))
2           GEOMETRYCOLLECTION EMPTY
dtype: geometry

See also

GeoSeries.exterior

outer boundary (without interior rings)

property bounds

Return a DataFrame with columns minx, miny, maxx, maxy values containing the bounds for each geometry.

See GeoSeries.total_bounds for the limits of the entire series.

Examples

>>> from shapely.geometry import Point, Polygon, LineString
>>> d = {'geometry': [Point(2, 1), Polygon([(0, 0), (1, 1), (1, 0)]),
... LineString([(0, 1), (1, 2)])]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf.bounds
   minx  miny  maxx  maxy
0   2.0   1.0   2.0   1.0
1   0.0   0.0   1.0   1.0
2   0.0   1.0   1.0   2.0

You can assign the bounds to the GeoDataFrame as:

>>> import pandas as pd
>>> gdf = pd.concat([gdf, gdf.bounds], axis=1)
>>> gdf
                        geometry  minx  miny  maxx  maxy
0                     POINT (2 1)   2.0   1.0   2.0   1.0
1  POLYGON ((0 0, 1 1, 1 0, 0 0))   0.0   0.0   1.0   1.0
2           LINESTRING (0 1, 1 2)   0.0   1.0   1.0   2.0
boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, backend=None, **kwargs)

Make a box plot from DataFrame columns.

Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than 1.5 * IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.

For further details see Wikipedia’s entry for boxplot.

Parameters:
  • column (str or list of str, optional) – Column name or list of names, or vector. Can be any valid input to pandas.DataFrame.groupby().

  • by (str or array-like, optional) – Column in the DataFrame to pandas.DataFrame.groupby(). One box-plot will be done per value of columns in by.

  • ax (object of class matplotlib.axes.Axes, optional) – The matplotlib axes to be used by boxplot.

  • fontsize (float or str) – Tick label font size in points or as a string (e.g., large).

  • rot (float, default 0) – The rotation angle of labels (in degrees) with respect to the screen coordinate system.

  • grid (bool, default True) – Setting this to True will show the grid.

  • figsize (A tuple (width, height) in inches) – The size of the figure to create in matplotlib.

  • layout (tuple (rows, columns), optional) – For example, (3, 5) will display the subplots using 3 rows and 5 columns, starting from the top-left.

  • return_type ({'axes', 'dict', 'both'} or None, default 'axes') –

    The kind of object to return. The default is axes.

    • ’axes’ returns the matplotlib axes the boxplot is drawn on.

    • ’dict’ returns a dictionary whose values are the matplotlib Lines of the boxplot.

    • ’both’ returns a namedtuple with the axes and dict.

    • when grouping with by, a Series mapping columns to return_type is returned.

      If return_type is None, a NumPy array of axes with the same shape as layout is returned.

  • backend (str, default None) – Backend to use instead of the backend specified in the option plotting.backend. For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend.

  • **kwargs – All other plotting keyword arguments to be passed to matplotlib.pyplot.boxplot().

  • self (DataFrame)

Returns:

See Notes.

Return type:

result

See also

pandas.Series.plot.hist

Make a histogram.

matplotlib.pyplot.boxplot

Matplotlib equivalent plot.

Notes

The return type depends on the return_type parameter:

  • ‘axes’ : object of class matplotlib.axes.Axes

  • ‘dict’ : dict of matplotlib.lines.Line2D objects

  • ‘both’ : a namedtuple with structure (ax, lines)

For data grouped with by, return a Series of the above or a numpy array:

  • Series

  • array (for return_type = None)

Use return_type='dict' when you want to tweak the appearance of the lines after plotting. In this case a dict containing the Lines making up the boxes, caps, fliers, medians, and whiskers is returned.

Examples

Boxplots can be created for every column in the dataframe by df.boxplot() or indicating the columns to be used:

Boxplots of variables distributions grouped by the values of a third variable can be created using the option by. For instance:

A list of strings (i.e. ['X', 'Y']) can be passed to boxplot in order to group the data by combination of the variables in the x-axis:

The layout of boxplot can be adjusted giving a tuple to layout:

Additional formatting can be done to the boxplot, like suppressing the grid (grid=False), rotating the labels in the x-axis (i.e. rot=45) or changing the fontsize (i.e. fontsize=15):

The parameter return_type can be used to select the type of element returned by boxplot. When return_type='axes' is selected, the matplotlib axes on which the boxplot is drawn are returned:

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], return_type='axes')
>>> type(boxplot)
<class 'matplotlib.axes._axes.Axes'>

When grouping with by, a Series mapping columns to return_type is returned:

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
...                      return_type='axes')
>>> type(boxplot)
<class 'pandas.core.series.Series'>

If return_type is None, a NumPy array of axes with the same shape as layout is returned:

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
...                      return_type=None)
>>> type(boxplot)
<class 'numpy.ndarray'>
buffer(distance, resolution=16, cap_style='round', join_style='round', mitre_limit=5.0, single_sided=False, **kwargs)

Return a GeoSeries of geometries representing all points within a given distance of each geometric object.

Computes the buffer of a geometry for positive and negative buffer distance.

The buffer of a geometry is defined as the Minkowski sum (or difference, for negative distance) of the geometry with a circle with radius equal to the absolute value of the buffer distance.

The buffer operation always returns a polygonal result. The negative or zero-distance buffer of lines and points is always empty.

Parameters:
  • distance (float, np.array, pd.Series) – The radius of the buffer in the Minkowski sum (or difference). If np.array or pd.Series are used then it must have same length as the GeoSeries.

  • resolution (int (optional, default 16)) – The resolution of the buffer around each vertex. Specifies the number of linear segments in a quarter circle in the approximation of circular arcs.

  • cap_style ({'round', 'square', 'flat'}, default 'round') – Specifies the shape of buffered line endings. 'round' results in circular line endings (see resolution). Both 'square' and 'flat' result in rectangular line endings, 'flat' will end at the original vertex, while 'square' involves adding the buffer width.

  • join_style ({'round', 'mitre', 'bevel'}, default 'round') – Specifies the shape of buffered line midpoints. 'round' results in rounded shapes. 'bevel' results in a beveled edge that touches the original vertex. 'mitre' results in a single vertex that is beveled depending on the mitre_limit parameter.

  • mitre_limit (float, default 5.0) – Crops of 'mitre'-style joins if the point is displaced from the buffered vertex by more than this limit.

  • single_sided (bool, default False) – Only buffer at one side of the geometry.

Examples

>>> from shapely.geometry import Point, LineString, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Point(0, 0),
...         LineString([(1, -1), (1, 0), (2, 0), (2, 1)]),
...         Polygon([(3, -1), (4, 0), (3, 1)]),
...     ]
... )
>>> s
0                         POINT (0 0)
1    LINESTRING (1 -1, 1 0, 2 0, 2 1)
2    POLYGON ((3 -1, 4 0, 3 1, 3 -1))
dtype: geometry
>>> s.buffer(0.2)
0    POLYGON ((0.2 0, 0.19904 -0.0196, 0.19616 -0.0...
1    POLYGON ((0.8 0, 0.80096 0.0196, 0.80384 0.039...
2    POLYGON ((2.8 -1, 2.8 1, 2.80096 1.0196, 2.803...
dtype: geometry

Further specification as ``join_style and cap_style are shown in the following illustration:

build_area(node=True)

Create an areal geometry formed by the constituent linework.

Builds areas from the GeoSeries that contain linework which represents the edges of a planar graph. Any geometry type may be provided as input; only the constituent lines and rings will be used to create the output polygons. All geometries within the GeoSeries are considered together and the resulting polygons therefore do not map 1:1 to input geometries.

This function converts inner rings into holes. To turn inner rings into polygons as well, use polygonize.

Unless you know that the input GeoSeries represents a planar graph with a clean topology (e.g. there is a node on both lines where they intersect), it is recommended to use node=True which performs noding prior to building areal geometry. Using node=False will provide performance benefits but may result in incorrect polygons if the input is not of the proper topology.

If the input linework crosses, this function may produce invalid polygons. Use GeoSeries.make_valid() to ensure valid geometries.

Parameters:

node (bool, default True) – Perform noding prior to building the areas, by default True.

Returns:

GeoSeries with polygons

Return type:

GeoSeries

Examples

>>> from shapely.geometry import LineString, Polygon
>>> s = geopandas.GeoSeries([
...     LineString([(18, 4), (4, 2), (2, 9)]),
...     LineString([(18, 4), (16, 16)]),
...     LineString([(16, 16), (8, 19), (8, 12), (2, 9)]),
...     LineString([(8, 6), (12, 13), (15, 8)]),
...     LineString([(8, 6), (15, 8)]),
...     LineString([(0, 0), (0, 3), (3, 3), (3, 0), (0, 0)]),
...     Polygon([(1, 1), (2, 2), (1, 2), (1, 1)]),
... ])
>>> s.build_area()
0    POLYGON ((0 3, 3 3, 3 0, 0 0, 0 3), (1 1, 2 2,...
1    POLYGON ((4 2, 2 9, 8 12, 8 19, 16 16, 18 4, 4...
Name: polygons, dtype: geometry
property centroid

Return a GeoSeries of points representing the centroid of each geometry.

Note that centroid does not have to be on or within original geometry.

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
>>> s.centroid
0    POINT (0.33333 0.66667)
1        POINT (0.70711 0.5)
2                POINT (0 0)
dtype: geometry

See also

GeoSeries.representative_point

point guaranteed to be within each geometry

clip(mask, keep_geom_type=False, sort=False)[source]

Clip points, lines, or polygon geometries to the mask extent.

Both layers must be in the same Coordinate Reference System (CRS). The GeoDataFrame will be clipped to the full extent of the mask object.

If there are multiple polygons in mask, data from the GeoDataFrame will be clipped to the total boundary of all polygons in mask.

Parameters:
  • mask (GeoDataFrame, GeoSeries, (Multi)Polygon, list-like) – Polygon vector layer used to clip the GeoDataFrame. The mask’s geometry is dissolved into one geometric feature and intersected with GeoDataFrame. If the mask is list-like with four elements (minx, miny, maxx, maxy), clip will use a faster rectangle clipping (clip_by_rect()), possibly leading to slightly different results.

  • keep_geom_type (boolean, default False) – If True, return only geometries of original type in case of intersection resulting in multiple geometry types or GeometryCollections. If False, return all resulting geometries (potentially mixed types).

  • sort (boolean, default False) – If True, the order of rows in the clipped GeoDataFrame will be preserved at small performance cost. If False the order of rows in the clipped GeoDataFrame will be random.

Returns:

Vector data (points, lines, polygons) from the GeoDataFrame clipped to polygon boundary from mask.

Return type:

GeoDataFrame

See also

clip

equivalent top-level function

Examples

Clip points (grocery stores) with polygons (the Near West Side community):

>>> import geodatasets
>>> chicago = geopandas.read_file(
...     geodatasets.get_path("geoda.chicago_health")
... )
>>> near_west_side = chicago[chicago["community"] == "NEAR WEST SIDE"]
>>> groceries = geopandas.read_file(
...     geodatasets.get_path("geoda.groceries")
... ).to_crs(chicago.crs)
>>> groceries.shape
(148, 8)
>>> nws_groceries = groceries.clip(near_west_side)
>>> nws_groceries.shape
(7, 8)
clip_by_rect(xmin, ymin, xmax, ymax)

Return a GeoSeries of the portions of geometry within the given rectangle.

Note that the results are not exactly equal to intersection(). E.g. in edge cases, clip_by_rect() will not return a point just touching the rectangle. Check the examples section below for some of these exceptions.

The geometry is clipped in a fast but possibly dirty way. The output is not guaranteed to be valid. No exceptions will be raised for topological errors.

Note: empty geometries or geometries that do not overlap with the specified bounds will result in GEOMETRYCOLLECTION EMPTY.

Parameters:
  • xmin (float) – Minimum x value of the rectangle

  • ymin (float) – Minimum y value of the rectangle

  • xmax (float) – Maximum x value of the rectangle

  • ymax (float) – Maximum y value of the rectangle

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> bounds = (0, 0, 1, 1)
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry
>>> s.clip_by_rect(*bounds)
0    POLYGON ((0 0, 0 1, 1 1, 0 0))
1    POLYGON ((0 0, 0 1, 1 1, 0 0))
2             LINESTRING (0 0, 1 1)
3          GEOMETRYCOLLECTION EMPTY
4          GEOMETRYCOLLECTION EMPTY
dtype: geometry

See also

GeoSeries.intersection

columns

The column labels of the DataFrame.

Examples

>>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
>>> df
     A  B
0    1  3
1    2  4
>>> df.columns
Index(['A', 'B'], dtype='object')
combine(other, func, fill_value=None, overwrite=True)

Perform column-wise combine with another DataFrame.

Combines a DataFrame with other DataFrame using func to element-wise combine columns. The row and column indexes of the resulting DataFrame will be the union of the two.

Parameters:
  • other (DataFrame) – The DataFrame to merge column-wise.

  • func (function) – Function that takes two series as inputs and return a Series or a scalar. Used to merge the two dataframes column by columns.

  • fill_value (scalar value, default None) – The value to fill NaNs with prior to passing any column to the merge func.

  • overwrite (bool, default True) – If True, columns in self that do not exist in other will be overwritten with NaNs.

Returns:

Combination of the provided DataFrames.

Return type:

DataFrame

See also

DataFrame.combine_first

Combine two DataFrame objects and default to non-null values in frame calling the method.

Examples

Combine using a simple function that chooses the smaller column.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3

Example using a true element-wise combine function.

>>> df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, np.minimum)
   A  B
0  1  2
1  0  3

Using fill_value fills Nones prior to passing the column to the merge function.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A    B
0  0 -5.0
1  0  4.0

However, if the same element in both dataframes is None, that None is preserved

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
    A    B
0  0 -5.0
1  0  3.0

Example that demonstrates the use of overwrite and behavior when the axis differ between the dataframes.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2])
>>> df1.combine(df2, take_smaller)
     A    B     C
0  NaN  NaN   NaN
1  NaN  3.0 -10.0
2  NaN  3.0   1.0
>>> df1.combine(df2, take_smaller, overwrite=False)
     A    B     C
0  0.0  NaN   NaN
1  0.0  3.0 -10.0
2  NaN  3.0   1.0

Demonstrating the preference of the passed in dataframe.

>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1], }, index=[1, 2])
>>> df2.combine(df1, take_smaller)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 NaN
2  NaN  3.0 NaN
>>> df2.combine(df1, take_smaller, overwrite=False)
     A    B   C
0  0.0  NaN NaN
1  0.0  3.0 1.0
2  NaN  3.0 1.0
combine_first(other)

Update null elements with value in the same location in other.

Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two. The resulting dataframe contains the ‘first’ dataframe values and overrides the second one values where both first.loc[index, col] and second.loc[index, col] are not missing values, upon calling first.combine_first(second).

Parameters:

other (DataFrame) – Provided DataFrame to use to fill null values.

Returns:

The result of combining the provided DataFrame with the other object.

Return type:

DataFrame

See also

DataFrame.combine

Perform series-wise operation on two DataFrames using a given function.

Examples

>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
     A    B
0  1.0  3.0
1  0.0  4.0

Null values still persist if the location of that null value does not exist in other

>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])
>>> df1.combine_first(df2)
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0
compare(other, align_axis=1, keep_shape=False, keep_equal=False, result_names=('self', 'other'))

Compare to another DataFrame and show the differences.

Parameters:
  • other (DataFrame) – Object to compare with.

  • align_axis ({0 or 'index', 1 or 'columns'}, default 1) –

    Determine which axis to align the comparison on.

    • 0, or ‘index’Resulting differences are stacked vertically

      with rows drawn alternately from self and other.

    • 1, or ‘columns’Resulting differences are aligned horizontally

      with columns drawn alternately from self and other.

  • keep_shape (bool, default False) – If true, all rows and columns are kept. Otherwise, only the ones with different values are kept.

  • keep_equal (bool, default False) – If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.

  • result_names (tuple, default (``’self’, ``'other')) –

    Set the dataframes names in the comparison.

    Added in version 1.5.0.

Returns:

DataFrame that shows the differences stacked side by side.

The resulting index will be a MultiIndex with ‘self’ and ‘other’ stacked alternately at the inner level.

Return type:

DataFrame

Raises:

ValueError – When the two DataFrames don’t have identical labels or shape.

See also

Series.compare

Compare with another Series and show differences.

DataFrame.equals

Test whether two objects contain the same elements.

Notes

Matching NaNs will not appear as a difference.

Can only compare identically-labeled (i.e. same shape, identical row and column labels) DataFrames

Examples

>>> df = pd.DataFrame(
...     {
...         "col1": ["a", "a", "b", "b", "a"],
...         "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
...     },
...     columns=["col1", "col2", "col3"],
... )
>>> df
  col1  col2  col3
0    a   1.0   1.0
1    a   2.0   2.0
2    b   3.0   3.0
3    b   NaN   4.0
4    a   5.0   5.0
>>> df2 = df.copy()
>>> df2.loc[0, 'col1'] = 'c'
>>> df2.loc[2, 'col3'] = 4.0
>>> df2
  col1  col2  col3
0    c   1.0   1.0
1    a   2.0   2.0
2    b   3.0   4.0
3    b   NaN   4.0
4    a   5.0   5.0

Align the differences on columns

>>> df.compare(df2)
  col1       col3
  self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

Assign result_names

>>> df.compare(df2, result_names=("left", "right"))
  col1       col3
  left right left right
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

Stack the differences on rows

>>> df.compare(df2, align_axis=0)
        col1  col3
0 self     a   NaN
  other    c   NaN
2 self   NaN   3.0
  other  NaN   4.0

Keep the equal values

>>> df.compare(df2, keep_equal=True)
  col1       col3
  self other self other
0    a     c  1.0   1.0
2    b     b  3.0   4.0

Keep all original rows and columns

>>> df.compare(df2, keep_shape=True)
  col1       col2       col3
  self other self other self other
0    a     c  NaN   NaN  NaN   NaN
1  NaN   NaN  NaN   NaN  NaN   NaN
2  NaN   NaN  NaN   NaN  3.0   4.0
3  NaN   NaN  NaN   NaN  NaN   NaN
4  NaN   NaN  NaN   NaN  NaN   NaN

Keep all original rows and columns and also all original values

>>> df.compare(df2, keep_shape=True, keep_equal=True)
  col1       col2       col3
  self other self other self other
0    a     c  1.0   1.0  1.0   1.0
1    a     a  2.0   2.0  2.0   2.0
2    b     b  3.0   3.0  3.0   4.0
3    b     b  NaN   NaN  4.0   4.0
4    a     a  5.0   5.0  5.0   5.0
concave_hull(ratio=0.0, allow_holes=False)

Return a GeoSeries of geometries representing the concave hull of vertices of each geometry.

The concave hull of a geometry is the smallest concave Polygon containing all the points in each geometry, unless the number of points in the geometric object is less than three. For two points, the concave hull collapses to a LineString; for 1, a Point.

The hull is constructed by removing border triangles of the Delaunay Triangulation of the points as long as their “size” is larger than the maximum edge length ratio and optionally allowing holes. The edge length factor is a fraction of the length difference between the longest and shortest edges in the Delaunay Triangulation of the input points. For further information on the algorithm used, see https://libgeos.org/doxygen/classgeos_1_1algorithm_1_1hull_1_1ConcaveHull.html

Parameters:
  • ratio (float, (optional, default 0.0)) – Number in the range [0, 1]. Higher numbers will include fewer vertices in the hull.

  • allow_holes (bool, (optional, default False)) – If set to True, the concave hull may have holes.

Examples

>>> from shapely.geometry import Polygon, LineString, Point, MultiPoint
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0), (0.5, 0.5)]),
...         MultiPoint([(0, 0), (1, 1)]),
...         Point(0, 0),
...     ],
...     crs=3857
... )
>>> s
0                       POLYGON ((0 0, 1 1, 0 1, 0 0))
1                           LINESTRING (0 0, 1 1, 1 0)
2    MULTIPOINT ((0 0), (1 1), (0 1), (1 0), (0.5 0...
3                            MULTIPOINT ((0 0), (1 1))
4                                          POINT (0 0)
dtype: geometry
>>> s.concave_hull()
0                      POLYGON ((0 1, 1 1, 0 0, 0 1))
1                      POLYGON ((0 0, 1 1, 1 0, 0 0))
2    POLYGON ((0.5 0.5, 0 1, 1 1, 1 0, 0 0, 0.5 0.5))
3                               LINESTRING (0 0, 1 1)
4                                         POINT (0 0)
dtype: geometry

See also

GeoSeries.convex_hull

convex hull geometry

Notes

The algorithms considers only vertices of each geometry. As a result the hull may not fully enclose input geometry. If that happens, increasing ratio should resolve the issue.

constrained_delaunay_triangles()

Return a GeoSeries with the constrained Delaunay triangulation of polygons.

A constrained Delaunay triangulation requires the edges of the input polygon(s) to be in the set of resulting triangle edges. An unconstrained delaunay triangulation only triangulates based on the vertices, hence triangle edges could cross polygon boundaries.

Requires Shapely >= 2.1.

Added in version 1.1.0.

Examples

>>> from shapely.geometry import Polygon
>>> s = geopandas.GeoSeries([Polygon([(0, 0), (1, 1), (0, 1)])])
>>> s
0                       POLYGON ((0 0, 1 1, 0 1, 0 0))
dtype: geometry
>>> s.constrained_delaunay_triangles()
0         GEOMETRYCOLLECTION (POLYGON ((0 0, 0 1, 1 1, 0...
dtype: geometry

See also

GeoSeries.delaunay_triangles

Delaunay triangulation

contains(other, align=None)

Return a Series of dtype('bool') with value True for each aligned geometry that contains other.

An object is said to contain other if at least one point of other lies in the interior and no points of other lie in the exterior of the object. (Therefore, any given polygon does not contain its own boundary - there is not any point that lies in the interior.) If either object is empty, this operation returns False.

This is the inverse of within() in the sense that the expression a.contains(b) == b.within(a) always evaluates to True.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if it is contained.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (0, 2)]),
...         LineString([(0, 0), (0, 1)]),
...         Point(0, 1),
...     ],
...     index=range(0, 4),
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (1, 2), (0, 2)]),
...         LineString([(0, 0), (0, 2)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1             LINESTRING (0 0, 0 2)
2             LINESTRING (0 0, 0 1)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2    POLYGON ((0 0, 1 2, 0 2, 0 0))
3             LINESTRING (0 0, 0 2)
4                       POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries contains a single geometry:

../../../_static/binary_op-03.svg
>>> point = Point(0, 1)
>>> s.contains(point)
0    False
1     True
2    False
3     True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s2.contains(s, align=True)
0    False
1    False
2    False
3     True
4    False
dtype: bool
>>> s2.contains(s, align=False)
1     True
2    False
3     True
4     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries contains any element of the other one.

See also

GeoSeries.contains_properly, GeoSeries.within

contains_properly(other, align=None)

Return a Series of dtype('bool') with value True for each aligned geometry that is completely inside other, with no common boundary points.

Geometry A contains geometry B properly if B intersects the interior of A but not the boundary (or exterior). This means that a geometry A does not “contain properly” itself, which contrasts with the contains() method, where common points on the boundary are allowed.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if it is contained.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (0, 2)]),
...         LineString([(0, 0), (0, 1)]),
...         Point(0, 1),
...     ],
...     index=range(0, 4),
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (1, 2), (0, 2)]),
...         LineString([(0, 0), (0, 2)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1             LINESTRING (0 0, 0 2)
2             LINESTRING (0 0, 0 1)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2    POLYGON ((0 0, 1 2, 0 2, 0 0))
3             LINESTRING (0 0, 0 2)
4                       POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries contains a single geometry:

../../../_static/binary_op-03.svg
>>> point = Point(0, 1)
>>> s.contains_properly(point)
0    False
1     True
2    False
3     True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s2.contains_properly(s, align=True)
0    False
1    False
2    False
3     True
4    False
dtype: bool
>>> s2.contains_properly(s, align=False)
1    False
2    False
3    False
4     True
dtype: bool

Compare it to the result of contains():

>>> s2.contains(s, align=False)
1     True
2    False
3     True
4     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries contains_properly any element of the other one.

See also

GeoSeries.contains

convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True, dtype_backend='numpy_nullable')

Convert columns to the best possible dtypes using dtypes supporting pd.NA.

Parameters:
  • infer_objects (bool, default True) – Whether object dtypes should be converted to the best possible types.

  • convert_string (bool, default True) – Whether object dtypes should be converted to StringDtype().

  • convert_integer (bool, default True) – Whether, if possible, conversion can be done to integer extension types.

  • convert_boolean (bool, defaults True) – Whether object dtypes should be converted to BooleanDtypes().

  • convert_floating (bool, defaults True) – Whether, if possible, conversion can be done to floating extension types. If convert_integer is also True, preference will be give to integer dtypes if the floats can be faithfully casted to integers.

  • dtype_backend ({'numpy_nullable', 'pyarrow'}, default 'numpy_nullable') –

    Back-end data type applied to the resultant DataFrame (still experimental). Behaviour is as follows:

    • "numpy_nullable": returns nullable-dtype-backed DataFrame (default).

    • "pyarrow": returns pyarrow-backed nullable ArrowDtype DataFrame.

    Added in version 2.0.

Returns:

Copy of input object with new dtype.

Return type:

Series or DataFrame

See also

infer_objects

Infer dtypes of objects.

to_datetime

Convert argument to datetime.

to_timedelta

Convert argument to timedelta.

to_numeric

Convert argument to a numeric type.

Notes

By default, convert_dtypes will attempt to convert a Series (or each Series in a DataFrame) to dtypes that support pd.NA. By using the options convert_string, convert_integer, convert_boolean and convert_floating, it is possible to turn off individual conversions to StringDtype, the integer extension types, BooleanDtype or floating extension types, respectively.

For object-dtyped columns, if infer_objects is True, use the inference rules as during normal Series/DataFrame construction. Then, if possible, convert to StringDtype, BooleanDtype or an appropriate integer or floating extension type, otherwise leave as object.

If the dtype is integer, convert to an appropriate integer extension type.

If the dtype is numeric, and consists of all integers, convert to an appropriate integer extension type. Otherwise, convert to an appropriate floating extension type.

In the future, as new dtypes are added that support pd.NA, the results of this method will change to support those new dtypes.

Examples

>>> df = pd.DataFrame(
...     {
...         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
...         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
...         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
...         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
...         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
...         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
...     }
... )

Start with a DataFrame with default dtypes.

>>> df
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0
>>> df.dtypes
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

Convert the DataFrame to use best possible dtypes.

>>> dfn = df.convert_dtypes()
>>> dfn
   a  b      c     d     e      f
0  1  x   True     h    10   <NA>
1  2  y  False     i  <NA>  100.5
2  3  z   <NA>  <NA>    20  200.0
>>> dfn.dtypes
a             Int32
b    string[python]
c           boolean
d    string[python]
e             Int64
f           Float64
dtype: object

Start with a Series of strings and missing data represented by np.nan.

>>> s = pd.Series(["a", "b", np.nan])
>>> s
0      a
1      b
2    NaN
dtype: object

Obtain a Series with dtype StringDtype.

>>> s.convert_dtypes()
0       a
1       b
2    <NA>
dtype: string
property convex_hull

Return a GeoSeries of geometries representing the convex hull of each geometry.

The convex hull of a geometry is the smallest convex Polygon containing all the points in each geometry, unless the number of points in the geometric object is less than three. For two points, the convex hull collapses to a LineString; for 1, a Point.

Examples

>>> from shapely.geometry import Polygon, LineString, Point, MultiPoint
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0), (0.5, 0.5)]),
...         MultiPoint([(0, 0), (1, 1)]),
...         Point(0, 0),
...     ]
... )
>>> s
0                       POLYGON ((0 0, 1 1, 0 1, 0 0))
1                           LINESTRING (0 0, 1 1, 1 0)
2    MULTIPOINT ((0 0), (1 1), (0 1), (1 0), (0.5 0...
3                            MULTIPOINT ((0 0), (1 1))
4                                          POINT (0 0)
dtype: geometry
>>> s.convex_hull
0         POLYGON ((0 0, 0 1, 1 1, 0 0))
1         POLYGON ((0 0, 1 1, 1 0, 0 0))
2    POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))
3                  LINESTRING (0 0, 1 1)
4                            POINT (0 0)
dtype: geometry

See also

GeoSeries.concave_hull

concave hull geometry

GeoSeries.envelope

bounding rectangle geometry

copy(deep=True)[source]

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.

Parameters:
  • data (ndarray (structured or homogeneous), Iterable, dict, or DataFrame) –

    Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order. If a dict contains Series which have an index defined, it is aligned by its index. This alignment also occurs if data is a Series or a DataFrame itself. Alignment is done on Series/DataFrame inputs.

    If data is a list of dicts, column order follows insertion-order.

  • index (Index or array-like) – Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.

  • columns (Index or array-like) – Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.

  • dtype (dtype, default None) – Data type to force. Only a single dtype is allowed. If None, infer.

  • copy (bool or None, default None) –

    Copy data from inputs. For dict data, the default of None behaves like copy=True. For DataFrame or 2d ndarray input, the default of None behaves like copy=False. If data is a dict containing one or more Series (possibly of different dtypes), copy=False will ensure that these inputs are not copied.

    Changed in version 1.3.0.

  • deep (bool)

Return type:

GeoDataFrame

See also

DataFrame.from_records

Constructor from tuples, also record arrays.

DataFrame.from_dict

From dicts of Series, arrays, or dicts.

read_csv

Read a comma-separated values (csv) file into DataFrame.

read_table

Read general delimited file into DataFrame.

read_clipboard

Read text from clipboard into DataFrame.

Notes

Please reference the User Guide for more information.

Examples

Constructing DataFrame from a dictionary.

>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
   col1  col2
0     1     3
1     2     4

Notice that the inferred dtype is int64.

>>> df.dtypes
col1    int64
col2    int64
dtype: object

To enforce a single dtype:

>>> df = pd.DataFrame(data=d, dtype=np.int8)
>>> df.dtypes
col1    int8
col2    int8
dtype: object

Constructing DataFrame from a dictionary including Series:

>>> d = {'col1': [0, 1, 2, 3], 'col2': pd.Series([2, 3], index=[2, 3])}
>>> pd.DataFrame(data=d, index=[0, 1, 2, 3])
   col1  col2
0     0   NaN
1     1   NaN
2     2   2.0
3     3   3.0

Constructing DataFrame from numpy ndarray:

>>> df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
...                    columns=['a', 'b', 'c'])
>>> df2
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

Constructing DataFrame from a numpy ndarray that has labeled columns:

>>> data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
...                 dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
>>> df3 = pd.DataFrame(data, columns=['c', 'a'])
...
>>> df3
   c  a
0  3  1
1  6  4
2  9  7

Constructing DataFrame from dataclass:

>>> from dataclasses import make_dataclass
>>> Point = make_dataclass("Point", [("x", int), ("y", int)])
>>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
   x  y
0  0  0
1  0  3
2  2  3

Constructing DataFrame from Series/DataFrame:

>>> ser = pd.Series([1, 2, 3], index=["a", "b", "c"])
>>> df = pd.DataFrame(data=ser, index=["a", "c"])
>>> df
   0
a  1
c  3
>>> df1 = pd.DataFrame([1, 2, 3], index=["a", "b", "c"], columns=["x"])
>>> df2 = pd.DataFrame(data=df1, index=["a", "c"])
>>> df2
   x
a  1
c  3
corr(method='pearson', min_periods=1, numeric_only=False)

Compute pairwise correlation of columns, excluding NA/null values.

Parameters:
  • method ({'pearson', 'kendall', 'spearman'} or callable) –

    Method of correlation:

    • pearson : standard correlation coefficient

    • kendall : Kendall Tau correlation coefficient

    • spearman : Spearman rank correlation

    • callable: callable with input two 1d ndarrays

      and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

  • min_periods (int, optional) – Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.

  • numeric_only (bool, default False) –

    Include only float, int or boolean data.

    Added in version 1.5.0.

    Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:

Correlation matrix.

Return type:

DataFrame

See also

DataFrame.corrwith

Compute pairwise correlation with another DataFrame or Series.

Series.corr

Compute the correlation between two Series.

Notes

Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.

Examples

>>> def histogram_intersection(a, b):
...     v = np.minimum(a, b).sum().round(decimals=1)
...     return v
>>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
...                   columns=['dogs', 'cats'])
>>> df.corr(method=histogram_intersection)
      dogs  cats
dogs   1.0   0.3
cats   0.3   1.0
>>> df = pd.DataFrame([(1, 1), (2, np.nan), (np.nan, 3), (4, 4)],
...                   columns=['dogs', 'cats'])
>>> df.corr(min_periods=3)
      dogs  cats
dogs   1.0   NaN
cats   NaN   1.0
corrwith(other, axis=0, drop=False, method='pearson', numeric_only=False)

Compute pairwise correlation.

Pairwise correlation is computed between rows or columns of DataFrame with rows or columns of Series or DataFrame. DataFrames are first aligned along both axes before computing the correlations.

Parameters:
  • other (DataFrame, Series) – Object with which to compute correlations.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ to compute row-wise, 1 or ‘columns’ for column-wise.

  • drop (bool, default False) – Drop missing indices from result.

  • method ({'pearson', 'kendall', 'spearman'} or callable) –

    Method of correlation:

    • pearson : standard correlation coefficient

    • kendall : Kendall Tau correlation coefficient

    • spearman : Spearman rank correlation

    • callable: callable with input two 1d ndarrays

      and returning a float.

  • numeric_only (bool, default False) –

    Include only float, int or boolean data.

    Added in version 1.5.0.

    Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:

Pairwise correlations.

Return type:

Series

See also

DataFrame.corr

Compute pairwise correlation of columns.

Examples

>>> index = ["a", "b", "c", "d", "e"]
>>> columns = ["one", "two", "three", "four"]
>>> df1 = pd.DataFrame(np.arange(20).reshape(5, 4), index=index, columns=columns)
>>> df2 = pd.DataFrame(np.arange(16).reshape(4, 4), index=index[:4], columns=columns)
>>> df1.corrwith(df2)
one      1.0
two      1.0
three    1.0
four     1.0
dtype: float64
>>> df2.corrwith(df1, axis=1)
a    1.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64
count(axis=0, numeric_only=False)

Count non-NA cells for each column or row.

The values None, NaN, NaT, pandas.NA are considered NA.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.

  • numeric_only (bool, default False) – Include only float, int or boolean data.

Returns:

For each column/row the number of non-NA/null entries.

Return type:

Series

See also

Series.count

Number of non-NA elements in a Series.

DataFrame.value_counts

Count unique combinations of columns.

DataFrame.shape

Number of DataFrame rows and columns (including NA elements).

DataFrame.isna

Boolean same-sized DataFrame showing places of NA elements.

Examples

Constructing DataFrame from a dictionary:

>>> df = pd.DataFrame({"Person":
...                    ["John", "Myla", "Lewis", "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]})
>>> df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2   Lewis  21.0    True
3    John  33.0    True
4    Myla  26.0   False

Notice the uncounted NA values:

>>> df.count()
Person    5
Age       4
Single    5
dtype: int64

Counts for each row:

>>> df.count(axis='columns')
0    3
1    2
2    3
3    3
4    3
dtype: int64
count_coordinates()

Return a Series containing the count of the number of coordinate pairs in each geometry.

Examples

An example of a GeoDataFrame with two line strings, one point and one None value:

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         LineString([(0, 0), (1, 1), (1, -1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, -1)]),
...         Point(0, 0),
...         Polygon([(10, 10), (10, 20), (20, 20), (20, 10), (10, 10)]),
...         None
...     ]
... )
>>> s
0                 LINESTRING (0 0, 1 1, 1 -1, 0 1)
1                      LINESTRING (0 0, 1 1, 1 -1)
2                                      POINT (0 0)
3    POLYGON ((10 10, 10 20, 20 20, 20 10, 10 10))
4                                             None
dtype: geometry
>>> s.count_coordinates()
0    4
1    3
2    1
3    5
4    0
dtype: int32

See also

GeoSeries.get_coordinates

extract coordinates as a DataFrame

GoSeries.count_geometries

count the number of geometries in a collection

count_geometries()

Return a Series containing the count of geometries in each multi-part geometry.

For single-part geometry objects, this is always 1. For multi-part geometries, like MultiPoint or MultiLineString, it is the number of parts in the geometry. For GeometryCollection, it is the number of geometries direct parts of the collection (the method does not recurse into collections within collections).

Examples

>>> from shapely.geometry import Point, MultiPoint, LineString, MultiLineString
>>> s = geopandas.GeoSeries(
...     [
...         MultiPoint([(0, 0), (1, 1), (1, -1), (0, 1)]),
...         MultiLineString([((0, 0), (1, 1)), ((-1, 0), (1, 0))]),
...         LineString([(0, 0), (1, 1), (1, -1)]),
...         Point(0, 0),
...     ]
... )
>>> s
0     MULTIPOINT ((0 0), (1 1), (1 -1), (0 1))
1    MULTILINESTRING ((0 0, 1 1), (-1 0, 1 0))
2                  LINESTRING (0 0, 1 1, 1 -1)
3                                  POINT (0 0)
dtype: geometry
>>> s.count_geometries()
0    4
1    2
2    1
3    1
dtype: int32

See also

GeoSeries.count_coordinates

count the number of coordinates in a geometry

GeoSeries.count_interior_rings

count the number of interior rings

count_interior_rings()

Return a Series containing the count of the number of interior rings in a polygonal geometry.

For non-polygonal geometries, this is always 0.

Examples

>>> from shapely.geometry import Polygon, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon(
...             [(0, 0), (0, 5), (5, 5), (5, 0)],
...             [[(1, 1), (1, 4), (4, 4), (4, 1)]],
...         ),
...         Polygon(
...             [(0, 0), (0, 5), (5, 5), (5, 0)],
...             [
...                 [(1, 1), (1, 2), (2, 2), (2, 1)],
...                 [(3, 2), (3, 3), (4, 3), (4, 2)],
...             ],
...         ),
...         Point(0, 1),
...     ]
... )
>>> s
0    POLYGON ((0 0, 0 5, 5 5, 5 0, 0 0), (1 1, 1 4,...
1    POLYGON ((0 0, 0 5, 5 5, 5 0, 0 0), (1 1, 1 2,...
2                                          POINT (0 1)
dtype: geometry
>>> s.count_interior_rings()
0    1
1    2
2    0
dtype: int32

See also

GeoSeries.count_coordinates

count the number of coordinates in a geometry

GeoSeries.count_geometries

count the number of geometries in a collection

cov(min_periods=None, ddof=1, numeric_only=False)

Compute pairwise covariance of columns, excluding NA/null values.

Compute the pairwise covariance among the series of a DataFrame. The returned data frame is the covariance matrix of the columns of the DataFrame.

Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as NaN.

This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

Parameters:
  • min_periods (int, optional) – Minimum number of observations required per pair of columns to have a valid result.

  • ddof (int, default 1) – Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. This argument is applicable only when no nan is in the dataframe.

  • numeric_only (bool, default False) –

    Include only float, int or boolean data.

    Added in version 1.5.0.

    Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:

The covariance matrix of the series of the DataFrame.

Return type:

DataFrame

See also

Series.cov

Compute covariance with another Series.

core.window.ewm.ExponentialMovingWindow.cov

Exponential weighted sample covariance.

core.window.expanding.Expanding.cov

Expanding sample covariance.

core.window.rolling.Rolling.cov

Rolling sample covariance.

Notes

Returns the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-ddof.

For DataFrames that have Series that are missing data (assuming that data is missing at random) the returned covariance matrix will be an unbiased estimate of the variance and covariance between the member Series.

However, for many applications this estimate may not be acceptable because the estimate covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimate correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix. See Estimation of covariance matrices for more details.

Examples

>>> df = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
...                   columns=['dogs', 'cats'])
>>> df.cov()
          dogs      cats
dogs  0.666667 -1.000000
cats -1.000000  1.666667
>>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(1000, 5),
...                   columns=['a', 'b', 'c', 'd', 'e'])
>>> df.cov()
          a         b         c         d         e
a  0.998438 -0.020161  0.059277 -0.008943  0.014144
b -0.020161  1.059352 -0.008543 -0.024738  0.009826
c  0.059277 -0.008543  1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486  0.921297 -0.013692
e  0.014144  0.009826 -0.000271 -0.013692  0.977795

Minimum number of periods

This method also supports an optional min_periods keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:

>>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(20, 3),
...                   columns=['a', 'b', 'c'])
>>> df.loc[df.index[:5], 'a'] = np.nan
>>> df.loc[df.index[5:10], 'b'] = np.nan
>>> df.cov(min_periods=12)
          a         b         c
a  0.316741       NaN -0.150812
b       NaN  1.248003  0.191417
c -0.150812  0.191417  0.895202
covered_by(other, align=None)

Return a Series of dtype('bool') with value True for each aligned geometry that is entirely covered by other.

An object A is said to cover another object B if no points of B lie in the exterior of A.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg

See https://lin-ear-th-inking.blogspot.com/2007/06/subtleties-of-ogc-covers-spatial.html for reference.

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to check is being covered.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         LineString([(1, 1), (1.5, 1.5)]),
...         Point(0, 0),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         Point(0, 0),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ...
1                  POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
2                            LINESTRING (1 1, 1.5 1.5)
3                                          POINT (0 0)
dtype: geometry
>>>
>>> s2
1    POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
2         POLYGON ((0 0, 2 2, 0 2, 0 0))
3                  LINESTRING (0 0, 2 2)
4                            POINT (0 0)
dtype: geometry

We can check if each geometry of GeoSeries is covered by a single geometry:

../../../_static/binary_op-03.svg
>>> poly = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
>>> s.covered_by(poly)
0    True
1    True
2    True
3    True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.covered_by(s2, align=True)
0    False
1     True
2     True
3     True
4    False
dtype: bool
>>> s.covered_by(s2, align=False)
0     True
1    False
2     True
3     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is covered_by any element of the other one.

See also

GeoSeries.covers, GeoSeries.overlaps

covers(other, align=None)

Return a Series of dtype('bool') with value True for each aligned geometry that is entirely covering other.

An object A is said to cover another object B if no points of B lie in the exterior of A. If either object is empty, this operation returns False.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg

See https://lin-ear-th-inking.blogspot.com/2007/06/subtleties-of-ogc-covers-spatial.html for reference.

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to check is being covered.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         Point(0, 0),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         LineString([(1, 1), (1.5, 1.5)]),
...         Point(0, 0),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
1         POLYGON ((0 0, 2 2, 0 2, 0 0))
2                  LINESTRING (0 0, 2 2)
3                            POINT (0 0)
dtype: geometry
>>> s2
1    POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ...
2                  POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
3                            LINESTRING (1 1, 1.5 1.5)
4                                          POINT (0 0)
dtype: geometry

We can check if each geometry of GeoSeries covers a single geometry:

../../../_static/binary_op-03.svg
>>> poly = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
>>> s.covers(poly)
0     True
1    False
2    False
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.covers(s2, align=True)
0    False
1    False
2    False
3    False
4    False
dtype: bool
>>> s.covers(s2, align=False)
0     True
1    False
2     True
3     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries covers any element of the other one.

See also

GeoSeries.covered_by, GeoSeries.overlaps

crosses(other, align=None)

Return a Series of dtype('bool') with value True for each aligned geometry that cross other.

An object is said to cross other if its interior intersects the interior of the other but does not contain it, and the dimension of the intersection is less than the dimension of the one or the other.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if is crossed.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1             LINESTRING (0 0, 2 2)
2             LINESTRING (2 0, 0 2)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    LINESTRING (1 0, 1 3)
2    LINESTRING (2 0, 0 2)
3              POINT (1 1)
4              POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries crosses a single geometry:

../../../_static/binary_op-03.svg
>>> line = LineString([(-1, 1), (3, 1)])
>>> s.crosses(line)
0     True
1     True
2     True
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.crosses(s2, align=True)
0    False
1     True
2    False
3    False
4    False
dtype: bool
>>> s.crosses(s2, align=False)
0     True
1     True
2    False
3    False
dtype: bool

Notice that a line does not cross a point that it contains.

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries crosses any element of the other one.

See also

GeoSeries.disjoint, GeoSeries.intersects

property crs: CRS

The Coordinate Reference System (CRS) represented as a pyproj.CRS object.

Returns None if the CRS is not set, and to set the value it :getter: Returns a pyproj.CRS or None. When setting, the value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

Examples

>>> gdf.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

See also

GeoDataFrame.set_crs

assign CRS

GeoDataFrame.to_crs

re-project to another CRS

cummax(axis=None, skipna=True, *args, **kwargs)

Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative maximum.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • *args – Additional keywords have no effect but might be accepted for compatibility with NumPy.

  • **kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative maximum of Series or DataFrame.

Return type:

Series or DataFrame

See also

core.window.expanding.Expanding.max

Similar functionality but ignores NaN values.

DataFrame.max

Return the maximum over DataFrame axis.

DataFrame.cummax

Return cumulative maximum over DataFrame axis.

DataFrame.cummin

Return cumulative minimum over DataFrame axis.

DataFrame.cumsum

Return cumulative sum over DataFrame axis.

DataFrame.cumprod

Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummax()
0    2.0
1    NaN
2    5.0
3    5.0
4    5.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummax(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the maximum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummax()
     A    B
0  2.0  1.0
1  3.0  NaN
2  3.0  1.0

To iterate over columns and find the maximum in each row, use axis=1

>>> df.cummax(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  1.0
cummin(axis=None, skipna=True, *args, **kwargs)

Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative minimum.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • *args – Additional keywords have no effect but might be accepted for compatibility with NumPy.

  • **kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative minimum of Series or DataFrame.

Return type:

Series or DataFrame

See also

core.window.expanding.Expanding.min

Similar functionality but ignores NaN values.

DataFrame.min

Return the minimum over DataFrame axis.

DataFrame.cummax

Return cumulative maximum over DataFrame axis.

DataFrame.cummin

Return cumulative minimum over DataFrame axis.

DataFrame.cumsum

Return cumulative sum over DataFrame axis.

DataFrame.cumprod

Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummin()
0    2.0
1    NaN
2    2.0
3   -1.0
4   -1.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummin(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the minimum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummin()
     A    B
0  2.0  1.0
1  2.0  NaN
2  1.0  0.0

To iterate over columns and find the minimum in each row, use axis=1

>>> df.cummin(axis=1)
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0
cumprod(axis=None, skipna=True, *args, **kwargs)

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative product.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • *args – Additional keywords have no effect but might be accepted for compatibility with NumPy.

  • **kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative product of Series or DataFrame.

Return type:

Series or DataFrame

See also

core.window.expanding.Expanding.prod

Similar functionality but ignores NaN values.

DataFrame.prod

Return the product over DataFrame axis.

DataFrame.cummax

Return cumulative maximum over DataFrame axis.

DataFrame.cummin

Return cumulative minimum over DataFrame axis.

DataFrame.cumsum

Return cumulative sum over DataFrame axis.

DataFrame.cumprod

Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumprod()
0     2.0
1     NaN
2    10.0
3   -10.0
4    -0.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumprod(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the product in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumprod()
     A    B
0  2.0  1.0
1  6.0  NaN
2  6.0  0.0

To iterate over columns and find the product in each row, use axis=1

>>> df.cumprod(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  0.0
cumsum(axis=None, skipna=True, *args, **kwargs)

Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative sum.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • *args – Additional keywords have no effect but might be accepted for compatibility with NumPy.

  • **kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative sum of Series or DataFrame.

Return type:

Series or DataFrame

See also

core.window.expanding.Expanding.sum

Similar functionality but ignores NaN values.

DataFrame.sum

Return the sum over DataFrame axis.

DataFrame.cummax

Return cumulative maximum over DataFrame axis.

DataFrame.cummin

Return cumulative minimum over DataFrame axis.

DataFrame.cumsum

Return cumulative sum over DataFrame axis.

DataFrame.cumprod

Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumsum()
0    2.0
1    NaN
2    7.0
3    6.0
4    6.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumsum(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the sum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumsum()
     A    B
0  2.0  1.0
1  5.0  NaN
2  6.0  1.0

To iterate over columns and find the sum in each row, use axis=1

>>> df.cumsum(axis=1)
     A    B
0  2.0  3.0
1  3.0  NaN
2  1.0  1.0
property cx

Coordinate based indexer to select by intersection with bounding box.

Format of input should be .cx[xmin:xmax, ymin:ymax]. Any of xmin, xmax, ymin, and ymax can be provided, but input must include a comma separating x and y slices. That is, .cx[:, :] will return the full series/frame, but .cx[:] is not implemented.

Examples

>>> from shapely.geometry import LineString, Point
>>> s = geopandas.GeoSeries(
...     [Point(0, 0), Point(1, 2), Point(3, 3), LineString([(0, 0), (3, 3)])]
... )
>>> s
0              POINT (0 0)
1              POINT (1 2)
2              POINT (3 3)
3    LINESTRING (0 0, 3 3)
dtype: geometry
>>> s.cx[0:1, 0:1]
0              POINT (0 0)
3    LINESTRING (0 0, 3 3)
dtype: geometry
>>> s.cx[:, 1:]
1              POINT (1 2)
2              POINT (3 3)
3    LINESTRING (0 0, 3 3)
dtype: geometry
delaunay_triangles(tolerance=0.0, only_edges=False)

Return a GeoSeries consisting of objects representing the computed Delaunay triangulation between the vertices of an input geometry.

All geometries within the GeoSeries are considered together within a single Delaunay triangulation. The resulting geometries therefore do not map 1:1 to input geometries. Note that each vertex of a geometry is considered a site for the triangulation, so the triangles will be constructed between the vertices of each geometry.

Notes

If you want to generate Delaunay triangles for each geometry separately, use shapely.delaunay_triangles() instead.

Parameters:
  • tolerance (float, default 0.0) – Snap input vertices together if their distance is less than this value.

  • only_edges (bool (optional, default False)) – If set to True, the triangulation will return linestrings instead of polygons.

Examples

>>> from shapely import LineString, MultiPoint, Point, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Point(1, 1),
...         Point(2, 2),
...         Point(1, 3),
...         Point(0, 2),
...     ]
... )
>>> s
0    POINT (1 1)
1    POINT (2 2)
2    POINT (1 3)
3    POINT (0 2)
dtype: geometry
>>> s.delaunay_triangles()
0    POLYGON ((0 2, 1 1, 1 3, 0 2))
1    POLYGON ((1 3, 1 1, 2 2, 1 3))
dtype: geometry
>>> s.delaunay_triangles(only_edges=True)
0    LINESTRING (1 3, 2 2)
1    LINESTRING (0 2, 1 3)
2    LINESTRING (0 2, 1 1)
3    LINESTRING (1 1, 2 2)
4    LINESTRING (1 1, 1 3)
dtype: geometry

The method supports any geometry type but keep in mind that the underlying algorithm is based on the vertices of the input geometries only and does not consider edge segments between vertices.

>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (2, 1), (1, 2)]),
...         MultiPoint([(2, 3), (2, 0), (3, 1)]),
...     ]
... )
>>> s2
0      POLYGON ((0 0, 1 1, 0 1, 0 0))
1          LINESTRING (1 0, 2 1, 1 2)
2    MULTIPOINT ((2 3), (2 0), (3 1))
dtype: geometry
>>> s2.delaunay_triangles()
0    POLYGON ((0 1, 0 0, 1 0, 0 1))
1    POLYGON ((0 1, 1 0, 1 1, 0 1))
2    POLYGON ((0 1, 1 1, 1 2, 0 1))
3    POLYGON ((1 2, 1 1, 2 1, 1 2))
4    POLYGON ((1 2, 2 1, 2 3, 1 2))
5    POLYGON ((2 3, 2 1, 3 1, 2 3))
6    POLYGON ((3 1, 2 1, 2 0, 3 1))
7    POLYGON ((2 0, 2 1, 1 1, 2 0))
8    POLYGON ((2 0, 1 1, 1 0, 2 0))
dtype: geometry

See also

GeoSeries.voronoi_polygons

Voronoi diagram around vertices

GeoSeries.constrained_delaunay_triangles

constrained Delaunay triangulation

describe(percentiles=None, include=None, exclude=None)

Generate descriptive statistics.

Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.

Parameters:
  • percentiles (list-like of numbers, optional) – The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.

  • include ('all', list-like of dtypes or None (default), optional) –

    A white list of data types to include in the result. Ignored for Series. Here are the options:

    • ’all’ : All columns of the input will be included in the output.

    • A list-like of dtypes : Limits the results to the provided data types. To limit the result to numeric types submit numpy.number. To limit it instead to object columns submit the numpy.object data type. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To select pandas categorical columns, use 'category'

    • None (default) : The result will include all numeric columns.

  • exclude (list-like of dtypes or None (default), optional,) –

    A black list of data types to omit from the result. Ignored for Series. Here are the options:

    • A list-like of dtypes : Excludes the provided data types from the result. To exclude numeric types submit numpy.number. To exclude object columns submit the data type numpy.object. Strings can also be used in the style of select_dtypes (e.g. df.describe(exclude=['O'])). To exclude pandas categorical columns, use 'category'

    • None (default) : The result will exclude nothing.

Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

Series or DataFrame

See also

DataFrame.count

Count number of non-NA/null observations.

DataFrame.max

Maximum of the values in the object.

DataFrame.min

Minimum of the values in the object.

DataFrame.mean

Mean of the values.

DataFrame.std

Standard deviation of the observations.

DataFrame.select_dtypes

Subset of a DataFrame including/excluding columns based on their dtype.

Notes

For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.

If multiple object values have the highest count, then the count and top results will be arbitrarily chosen from among those with the highest count.

For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.

Examples

Describing a numeric Series.

>>> s = pd.Series([1, 2, 3])
>>> s.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
dtype: float64

Describing a categorical Series.

>>> s = pd.Series(['a', 'a', 'b', 'c'])
>>> s.describe()
count     4
unique    3
top       a
freq      2
dtype: object

Describing a timestamp Series.

>>> s = pd.Series([
...     np.datetime64("2000-01-01"),
...     np.datetime64("2010-01-01"),
...     np.datetime64("2010-01-01")
... ])
>>> s.describe()
count                      3
mean     2006-09-01 08:00:00
min      2000-01-01 00:00:00
25%      2004-12-31 12:00:00
50%      2010-01-01 00:00:00
75%      2010-01-01 00:00:00
max      2010-01-01 00:00:00
dtype: object

Describing a DataFrame. By default only numeric fields are returned.

>>> df = pd.DataFrame({'categorical': pd.Categorical(['d', 'e', 'f']),
...                    'numeric': [1, 2, 3],
...                    'object': ['a', 'b', 'c']
...                    })
>>> df.describe()
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Describing all columns of a DataFrame regardless of data type.

>>> df.describe(include='all')
       categorical  numeric object
count            3      3.0      3
unique           3      NaN      3
top              f      NaN      a
freq             1      NaN      1
mean           NaN      2.0    NaN
std            NaN      1.0    NaN
min            NaN      1.0    NaN
25%            NaN      1.5    NaN
50%            NaN      2.0    NaN
75%            NaN      2.5    NaN
max            NaN      3.0    NaN

Describing a column from a DataFrame by accessing it as an attribute.

>>> df.numeric.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
Name: numeric, dtype: float64

Including only numeric columns in a DataFrame description.

>>> df.describe(include=[np.number])
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Including only string columns in a DataFrame description.

>>> df.describe(include=[object])
       object
count       3
unique      3
top         a
freq        1

Including only categorical columns from a DataFrame description.

>>> df.describe(include=['category'])
       categorical
count            3
unique           3
top              d
freq             1

Excluding numeric columns from a DataFrame description.

>>> df.describe(exclude=[np.number])
       categorical object
count            3      3
unique           3      3
top              f      a
freq             1      1

Excluding object columns from a DataFrame description.

>>> df.describe(exclude=[object])
       categorical  numeric
count            3      3.0
unique           3      NaN
top              f      NaN
freq             1      NaN
mean           NaN      2.0
std            NaN      1.0
min            NaN      1.0
25%            NaN      1.5
50%            NaN      2.0
75%            NaN      2.5
max            NaN      3.0
diff(periods=1, axis=0)

First discrete difference of element.

Calculates the difference of a DataFrame element compared with another element in the DataFrame (default is element in previous row).

Parameters:
  • periods (int, default 1) – Periods to shift for calculating difference, accepts negative values.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Take difference over rows (0) or columns (1).

Returns:

First differences of the Series.

Return type:

DataFrame

See also

DataFrame.pct_change

Percent change over given number of periods.

DataFrame.shift

Shift index by desired number of periods with an optional time freq.

Series.diff

First discrete difference of object.

Notes

For boolean dtypes, this uses operator.xor() rather than operator.sub(). The result is calculated according to current dtype in DataFrame, however dtype of the result is always float64.

Examples

Difference with previous row

>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
...                    'b': [1, 1, 2, 3, 5, 8],
...                    'c': [1, 4, 9, 16, 25, 36]})
>>> df
   a  b   c
0  1  1   1
1  2  1   4
2  3  2   9
3  4  3  16
4  5  5  25
5  6  8  36
>>> df.diff()
     a    b     c
0  NaN  NaN   NaN
1  1.0  0.0   3.0
2  1.0  1.0   5.0
3  1.0  1.0   7.0
4  1.0  2.0   9.0
5  1.0  3.0  11.0

Difference with previous column

>>> df.diff(axis=1)
    a  b   c
0 NaN  0   0
1 NaN -1   3
2 NaN -1   7
3 NaN -1  13
4 NaN  0  20
5 NaN  2  28

Difference with 3rd previous row

>>> df.diff(periods=3)
     a    b     c
0  NaN  NaN   NaN
1  NaN  NaN   NaN
2  NaN  NaN   NaN
3  3.0  2.0  15.0
4  3.0  4.0  21.0
5  3.0  6.0  27.0

Difference with following row

>>> df.diff(periods=-1)
     a    b     c
0 -1.0  0.0  -3.0
1 -1.0 -1.0  -5.0
2 -1.0 -1.0  -7.0
3 -1.0 -2.0  -9.0
4 -1.0 -3.0 -11.0
5  NaN  NaN   NaN

Overflow in input dtype

>>> df = pd.DataFrame({'a': [1, 0]}, dtype=np.uint8)
>>> df.diff()
       a
0    NaN
1  255.0
difference(other, align=None)

Return a GeoSeries of the points in each aligned geometry that are not in other.

../../../_static/binary_geo-difference.svg

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the difference to.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 6),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

We can do difference of each geometry and a single shapely geometry:

../../../_static/binary_op-03.svg
>>> s.difference(Polygon([(0, 0), (1, 1), (0, 1)]))
0       POLYGON ((0 2, 2 2, 1 1, 0 1, 0 2))
1         POLYGON ((0 2, 2 2, 1 1, 0 1, 0 2))
2                       LINESTRING (1 1, 2 2)
3    MULTILINESTRING ((2 0, 1 1), (1 1, 0 2))
4                                 POINT EMPTY
dtype: geometry

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.difference(s2, align=True)
0                                        None
1         POLYGON ((0 2, 2 2, 1 1, 0 1, 0 2))
2    MULTILINESTRING ((0 0, 1 1), (1 1, 2 2))
3                            LINESTRING EMPTY
4                                 POINT (0 1)
5                                        None
dtype: geometry
>>> s.difference(s2, align=False)
0         POLYGON ((0 2, 2 2, 1 1, 0 1, 0 2))
1    POLYGON ((0 0, 0 2, 1 2, 2 2, 1 1, 0 0))
2    MULTILINESTRING ((0 0, 1 1), (1 1, 2 2))
3                       LINESTRING (2 0, 0 2)
4                                 POINT EMPTY
dtype: geometry

See also

GeoSeries.symmetric_difference, GeoSeries.union, GeoSeries.intersection

disjoint(other, align=None)

Return a Series of dtype('bool') with value True for each aligned geometry disjoint to other.

An object is said to be disjoint to other if its boundary and interior does not intersect at all with those of the other.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if is disjoint.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(-1, 0), (-1, 2), (0, -2)]),
...         LineString([(0, 0), (0, 1)]),
...         Point(1, 1),
...         Point(0, 0),
...     ],
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1             LINESTRING (0 0, 2 2)
2             LINESTRING (2 0, 0 2)
3                       POINT (0 1)
dtype: geometry
>>> s2
0    POLYGON ((-1 0, -1 2, 0 -2, -1 0))
1                 LINESTRING (0 0, 0 1)
2                           POINT (1 1)
3                           POINT (0 0)
dtype: geometry

We can check each geometry of GeoSeries to a single geometry:

../../../_static/binary_op-03.svg
>>> line = LineString([(0, 0), (2, 0)])
>>> s.disjoint(line)
0    False
1    False
2    False
3     True
dtype: bool

We can also check two GeoSeries against each other, row by row. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.disjoint(s2)
0     True
1    False
2    False
3     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is equal to any element of the other one.

See also

GeoSeries.intersects, GeoSeries.touches

dissolve(by=None, aggfunc='first', as_index=True, level=None, sort=True, observed=False, dropna=True, method='unary', grid_size=None, **kwargs)[source]

Dissolve geometries within groupby into single observation. This is accomplished by applying the union_all method to all geometries within a groupself.

Observations associated with each groupby group will be aggregated using the aggfunc.

Parameters:
  • by (str or list-like, default None) – Column(s) whose values define the groups to be dissolved. If None, the entire GeoDataFrame is considered as a single group. If a list-like object is provided, the values in the list are treated as categorical labels, and polygons will be combined based on the equality of these categorical labels.

  • aggfunc (function or string, default "first") –

    Aggregation function for manipulation of data associated with each group. Passed to pandas groupby.agg method. Accepted combinations are:

    • function

    • string function name

    • list of functions and/or function names, e.g. [np.sum, ‘mean’]

    • dict of axis labels -> functions, function names or list of such.

  • as_index (boolean, default True) – If true, groupby columns become index of result.

  • level (int or str or sequence of int or sequence of str, default None) – If the axis is a MultiIndex (hierarchical), group by a particular level or levels.

  • sort (bool, default True) – Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

  • observed (bool, default False) – This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

  • dropna (bool, default True) – If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

  • method (str (default ````”unary”:py:class:`)`) –

    The method to use for the union. Options are:

    • "unary": use the unary union algorithm. This option is the most robust but can be slow for large numbers of geometries (default).

    • "coverage": use the coverage union algorithm. This option is optimized for non-overlapping polygons and can be significantly faster than the unary union algorithm. However, it can produce invalid geometries if the polygons overlap.

    • "disjoint_subset:: use the disjoint subset union algorithm. This option is optimized for inputs that can be divided into subsets that do not intersect. If there is only one such subset, performance can be expected to be worse than "unary". Requires Shapely >= 2.1.

  • grid_size (float | None)

Return type:

GeoDataFrame

grid_sizefloat, default None

When grid size is specified, a fixed-precision space is used to perform the union operations. This can be useful when unioning geometries that are not perfectly snapped or to avoid geometries not being unioned because of robustness issues. The inputs are first snapped to a grid of the given size. When a line segment of a geometry is within tolerance off a vertex of another geometry, this vertex will be inserted in the line segment. Finally, the result vertices are computed on the same grid. Is only supported for method "unary". If None, the highest precision of the inputs will be used. Defaults to None.

Added in version 1.1.0.

**kwargs :

Keyword arguments to be passed to the pandas DataFrameGroupby.agg method which is used by dissolve. In particular, numeric_only may be supplied, which will be required in pandas 2.0 for certain aggfuncs.

Added in version 0.13.0.

Return type:

GeoDataFrame

Parameters:
  • by (str | None)

  • as_index (bool)

  • sort (bool)

  • observed (bool)

  • dropna (bool)

  • method (Literal['unary', 'coverage', 'disjoint_subset'])

  • grid_size (float | None)

Examples

>>> from shapely.geometry import Point
>>> d = {
...     "col1": ["name1", "name2", "name1"],
...     "geometry": [Point(1, 2), Point(2, 1), Point(0, 1)],
... }
>>> gdf = geopandas.GeoDataFrame(d, crs=4326)
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)
2  name1  POINT (0 1)
>>> dissolved = gdf.dissolve('col1')
>>> dissolved
                        geometry
col1
name1  MULTIPOINT ((0 1), (1 2))
name2                POINT (2 1)

See also

GeoDataFrame.explode

explode multi-part geometries into single geometries

distance(other, align=None)

Return a Series containing the distance to aligned other.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the distance to.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (float)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 0), (1, 1)]),
...         Polygon([(0, 0), (-1, 0), (-1, 1)]),
...         LineString([(1, 1), (0, 0)]),
...         Point(0, 0),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Point(3, 1),
...         LineString([(1, 0), (2, 0)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0      POLYGON ((0 0, 1 0, 1 1, 0 0))
1    POLYGON ((0 0, -1 0, -1 1, 0 0))
2               LINESTRING (1 1, 0 0)
3                         POINT (0 0)
dtype: geometry
>>> s2
1    POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ...
2                                          POINT (3 1)
3                                LINESTRING (1 0, 2 0)
4                                          POINT (0 1)
dtype: geometry

We can check the distance of each geometry of GeoSeries to a single geometry:

../../../_static/binary_op-03.svg
>>> point = Point(-1, 0)
>>> s.distance(point)
0    1.0
1    0.0
2    1.0
3    1.0
dtype: float64

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and use elements with the same index using align=True or ignore index and use elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.distance(s2, align=True)
0         NaN
1    0.707107
2    2.000000
3    1.000000
4         NaN
dtype: float64
>>> s.distance(s2, align=False)
0    0.000000
1    3.162278
2    0.707107
3    1.000000
dtype: float64
div(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
divide(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
dot(other)

Compute the matrix multiplication between the DataFrame and other.

This method computes the matrix product between the DataFrame and the values of an other Series, DataFrame or a numpy array.

It can also be called using self @ other.

Parameters:

other (Series, DataFrame or array-like) – The other object to compute the matrix product with.

Returns:

If other is a Series, return the matrix product between self and other as a Series. If other is a DataFrame or a numpy.array, return the matrix product of self and other in a DataFrame of a np.array.

Return type:

Series or DataFrame

See also

Series.dot

Similar method for Series.

Notes

The dimensions of DataFrame and other must be compatible in order to compute the matrix multiplication. In addition, the column names of DataFrame and the index of other must contain the same values, as they will be aligned prior to the multiplication.

The dot method for Series computes the inner product, instead of the matrix product here.

Examples

Here we multiply a DataFrame with a Series.

>>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> s = pd.Series([1, 1, 2, 1])
>>> df.dot(s)
0    -4
1     5
dtype: int64

Here we multiply a DataFrame with another DataFrame.

>>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(other)
    0   1
0   1   4
1   2   2

Note that the dot method give the same result as @

>>> df @ other
    0   1
0   1   4
1   2   2

The dot method works also if other is an np.array.

>>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(arr)
    0   1
0   1   4
1   2   2

Note how shuffling of the objects does not change the result.

>>> s2 = s.reindex([1, 0, 2, 3])
>>> df.dot(s2)
0    -4
1     5
dtype: int64
drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by directly specifying index or column names. When using a multi-index, labels on different levels can be removed by specifying the level. See the user guide for more information about the now unused levels.

Parameters:
  • labels (single label or list-like) – Index or column labels to drop. A tuple will be used as a single label and not treated as a list-like.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).

  • index (single label or list-like) – Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).

  • columns (single label or list-like) – Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

  • level (int or level name, optional) – For MultiIndex, level from which the labels will be removed.

  • inplace (bool, default False) – If False, return a copy. Otherwise, do operation in place and return None.

  • errors ({'ignore', 'raise'}, default 'raise') – If ‘ignore’, suppress error and only existing labels are dropped.

Returns:

Returns DataFrame or None DataFrame with the specified index or column labels removed or None if inplace=True.

Return type:

DataFrame or None

Raises:

KeyError – If any of the labels is not found in the selected axis.

See also

DataFrame.loc

Label-location based indexer for selection by label.

DataFrame.dropna

Return DataFrame with labels on given axis omitted where (all or any) data are missing.

DataFrame.drop_duplicates

Return DataFrame with duplicate rows removed, optionally only considering certain columns.

Series.drop

Return Series with specified index labels removed.

Examples

>>> df = pd.DataFrame(np.arange(12).reshape(3, 4),
...                   columns=['A', 'B', 'C', 'D'])
>>> df
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

Drop columns

>>> df.drop(['B', 'C'], axis=1)
   A   D
0  0   3
1  4   7
2  8  11
>>> df.drop(columns=['B', 'C'])
   A   D
0  0   3
1  4   7
2  8  11

Drop a row by index

>>> df.drop([0, 1])
   A  B   C   D
2  8  9  10  11

Drop columns and/or rows of MultiIndex DataFrame

>>> midx = pd.MultiIndex(levels=[['llama', 'cow', 'falcon'],
...                              ['speed', 'weight', 'length']],
...                      codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                             [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> df = pd.DataFrame(index=midx, columns=['big', 'small'],
...                   data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
...                         [250, 150], [1.5, 0.8], [320, 250],
...                         [1, 0.8], [0.3, 0.2]])
>>> df
                big     small
llama   speed   45.0    30.0
        weight  200.0   100.0
        length  1.5     1.0
cow     speed   30.0    20.0
        weight  250.0   150.0
        length  1.5     0.8
falcon  speed   320.0   250.0
        weight  1.0     0.8
        length  0.3     0.2

Drop a specific index combination from the MultiIndex DataFrame, i.e., drop the combination 'falcon' and 'weight', which deletes only the corresponding row

>>> df.drop(index=('falcon', 'weight'))
                big     small
llama   speed   45.0    30.0
        weight  200.0   100.0
        length  1.5     1.0
cow     speed   30.0    20.0
        weight  250.0   150.0
        length  1.5     0.8
falcon  speed   320.0   250.0
        length  0.3     0.2
>>> df.drop(index='cow', columns='small')
                big
llama   speed   45.0
        weight  200.0
        length  1.5
falcon  speed   320.0
        weight  1.0
        length  0.3
>>> df.drop(index='length', level=1)
                big     small
llama   speed   45.0    30.0
        weight  200.0   100.0
cow     speed   30.0    20.0
        weight  250.0   150.0
falcon  speed   320.0   250.0
        weight  1.0     0.8
drop_duplicates(subset=None, *, keep='first', inplace=False, ignore_index=False)

Return DataFrame with duplicate rows removed.

Considering certain columns is optional. Indexes, including time indexes are ignored.

Parameters:
  • subset (column label or sequence of labels, optional) – Only consider certain columns for identifying duplicates, by default use all of the columns.

  • keep ({'first', 'last', ``False}``, default 'first') –

    Determines which duplicates (if any) to keep.

    • ’first’ : Drop duplicates except for the first occurrence.

    • ’last’ : Drop duplicates except for the last occurrence.

    • False : Drop all duplicates.

  • inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.

  • ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.

Returns:

DataFrame with duplicates removed or None if inplace=True.

Return type:

DataFrame or None

See also

DataFrame.value_counts

Count unique combinations of columns.

Examples

Consider dataset containing ramen rating.

>>> df = pd.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, it removes duplicate rows based on all columns.

>>> df.drop_duplicates()
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

To remove duplicates on specific column(s), use subset.

>>> df.drop_duplicates(subset=['brand'])
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5

To remove duplicates and keep last occurrences, use keep.

>>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
    brand style  rating
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
4  Indomie  pack     5.0
droplevel(level, axis=0)

Return Series/DataFrame with requested index / column level(s) removed.

Parameters:
  • level (int, str, or list-like) – If a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    Axis along which the level(s) is removed:

    • 0 or ‘index’: remove level(s) in column.

    • 1 or ‘columns’: remove level(s) in row.

    For Series this parameter is unused and defaults to 0.

Returns:

Series/DataFrame with requested index / column level(s) removed.

Return type:

Series/DataFrame

Examples

>>> df = pd.DataFrame([
...     [1, 2, 3, 4],
...     [5, 6, 7, 8],
...     [9, 10, 11, 12]
... ]).set_index([0, 1]).rename_axis(['a', 'b'])
>>> df.columns = pd.MultiIndex.from_tuples([
...     ('c', 'e'), ('d', 'f')
... ], names=['level_1', 'level_2'])
>>> df
level_1   c   d
level_2   e   f
a b
1 2      3   4
5 6      7   8
9 10    11  12
>>> df.droplevel('a')
level_1   c   d
level_2   e   f
b
2        3   4
6        7   8
10      11  12
>>> df.droplevel('level_2', axis=1)
level_1   c   d
a b
1 2      3   4
5 6      7   8
9 10    11  12
dropna(*, axis=0, how=<no_default>, thresh=<no_default>, subset=None, inplace=False, ignore_index=False)

Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    Determine if rows or columns which contain missing values are removed.

    • 0, or ‘index’ : Drop rows which contain missing values.

    • 1, or ‘columns’ : Drop columns which contain missing value.

    Only a single axis is allowed.

  • how ({'any', 'all'}, default 'any') –

    Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

    • ’any’ : If any NA values are present, drop that row or column.

    • ’all’ : If all values are NA, drop that row or column.

  • thresh (int, optional) – Require that many non-NA values. Cannot be combined with how.

  • subset (column label or sequence of labels, optional) – Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

  • inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.

  • ignore_index (bool, default False) –

    If True, the resulting axis will be labeled 0, 1, …, n - 1.

    Added in version 2.0.0.

Returns:

DataFrame with NA entries dropped from it or None if inplace=True.

Return type:

DataFrame or None

See also

DataFrame.isna

Indicate missing values.

DataFrame.notna

Indicate existing (non-missing) values.

DataFrame.fillna

Replace missing values.

Series.dropna

Drop missing values.

Index.dropna

Drop missing indices.

Examples

>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...                    "born": [pd.NaT, pd.Timestamp("1940-04-25"),
...                             pd.NaT]})
>>> df
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Drop the rows where at least one element is missing.

>>> df.dropna()
     name        toy       born
1  Batman  Batmobile 1940-04-25

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman

Drop the rows where all elements are missing.

>>> df.dropna(how='all')
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Keep only the rows with at least 2 non-NA values.

>>> df.dropna(thresh=2)
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'toy'])
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT
property dtypes

Return the dtypes in the DataFrame.

This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype. See the User Guide for more.

Returns:

The data type of each column.

Return type:

pandas.Series

Examples

>>> df = pd.DataFrame({'float': [1.0],
...                    'int': [1],
...                    'datetime': [pd.Timestamp('20180310')],
...                    'string': ['foo']})
>>> df.dtypes
float              float64
int                  int64
datetime    datetime64[ns]
string              object
dtype: object
duplicated(subset=None, keep='first')

Return boolean Series denoting duplicate rows.

Considering certain columns is optional.

Parameters:
  • subset (column label or sequence of labels, optional) – Only consider certain columns for identifying duplicates, by default use all of the columns.

  • keep ({'first', 'last', False}, default 'first') –

    Determines which duplicates (if any) to mark.

    • first : Mark duplicates as True except for the first occurrence.

    • last : Mark duplicates as True except for the last occurrence.

    • False : Mark all duplicates as True.

Returns:

Boolean series for each duplicated rows.

Return type:

Series

See also

Index.duplicated

Equivalent method on index.

Series.duplicated

Equivalent method on Series.

Series.drop_duplicates

Remove duplicate values from Series.

DataFrame.drop_duplicates

Remove duplicate values from DataFrame.

Examples

Consider dataset containing ramen rating.

>>> df = pd.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, for each set of duplicated values, the first occurrence is set on False and all others on True.

>>> df.duplicated()
0    False
1     True
2    False
3    False
4    False
dtype: bool

By using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True.

>>> df.duplicated(keep='last')
0     True
1    False
2    False
3    False
4    False
dtype: bool

By setting keep on False, all duplicates are True.

>>> df.duplicated(keep=False)
0     True
1     True
2    False
3    False
4    False
dtype: bool

To find duplicates on specific column(s), use subset.

>>> df.duplicated(subset=['brand'])
0    False
1     True
2    False
3     True
4     True
dtype: bool
dwithin(other, distance, align=None)

Return a Series of dtype('bool') with value True for each aligned geometry that is within a set distance from other.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test for equality.

  • distance (float, np.array, pd.Series) – Distance(s) to test if each geometry is within. A scalar distance will be applied to all geometries. An array or Series will be applied elementwise. If np.array or pd.Series are used then it must have same length as the GeoSeries.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (0, 2)]),
...         LineString([(0, 0), (0, 1)]),
...         Point(0, 1),
...     ],
...     index=range(0, 4),
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(1, 0), (4, 2), (2, 2)]),
...         Polygon([(2, 0), (3, 2), (2, 2)]),
...         LineString([(2, 0), (2, 2)]),
...         Point(1, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1             LINESTRING (0 0, 0 2)
2             LINESTRING (0 0, 0 1)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((1 0, 4 2, 2 2, 1 0))
2    POLYGON ((2 0, 3 2, 2 2, 2 0))
3             LINESTRING (2 0, 2 2)
4                       POINT (1 1)
dtype: geometry

We can check if each geometry of GeoSeries contains a single geometry:

../../../_static/binary_op-03.svg
>>> point = Point(0, 1)
>>> s2.dwithin(point, 1.8)
1     True
2    False
3    False
4     True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.dwithin(s2, distance=1, align=True)
0    False
1     True
2    False
3    False
4    False
dtype: bool
>>> s.dwithin(s2, distance=1, align=False)
0     True
1    False
2    False
3     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is within the set distance of any element of the other one.

See also

GeoSeries.within

property empty: bool

Indicator whether Series/DataFrame is empty.

True if Series/DataFrame is entirely empty (no items), meaning any of the axes are of length 0.

Returns:

If Series/DataFrame is empty, return True, if not return False.

Return type:

bool

See also

Series.dropna

Return series without null values.

DataFrame.dropna

Return DataFrame with labels on given axis omitted where (all or any) data are missing.

Notes

If Series/DataFrame contains only NaNs, it is still not considered empty. See the example below.

Examples

An example of an actual empty DataFrame. Notice the index is empty:

>>> df_empty = pd.DataFrame({'A' : []})
>>> df_empty
Empty DataFrame
Columns: [A]
Index: []
>>> df_empty.empty
True

If we only have NaNs in our DataFrame, it is not considered empty! We will need to drop the NaNs to make the DataFrame empty:

>>> df = pd.DataFrame({'A' : [np.nan]})
>>> df
    A
0 NaN
>>> df.empty
False
>>> df.dropna().empty
True
>>> ser_empty = pd.Series({'A' : []})
>>> ser_empty
A    []
dtype: object
>>> ser_empty.empty
False
>>> ser_empty = pd.Series()
>>> ser_empty.empty
True
property envelope

Return a GeoSeries of geometries representing the envelope of each geometry.

The envelope of a geometry is the bounding rectangle. That is, the point or smallest rectangular polygon (with sides parallel to the coordinate axes) that contains the geometry.

Examples

>>> from shapely.geometry import Polygon, LineString, Point, MultiPoint
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         MultiPoint([(0, 0), (1, 1)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2         MULTIPOINT ((0 0), (1 1))
3                       POINT (0 0)
dtype: geometry
>>> s.envelope
0    POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))
1    POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))
2    POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))
3                            POINT (0 0)
dtype: geometry

See also

GeoSeries.convex_hull

convex hull geometry

eq(other, axis='columns', level=None)

Get Equal to of dataframe and other, element-wise (binary operator eq).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
equals(other)

Test whether two objects contain the same elements.

This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

The row/column index do not need to have the same type, as long as the values are considered equal. Corresponding columns and index must be of the same dtype.

Parameters:

other (Series or DataFrame) – The other Series or DataFrame to be compared with the first.

Returns:

True if all elements are the same in both objects, False otherwise.

Return type:

bool

See also

Series.eq

Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise.

DataFrame.eq

Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.

testing.assert_series_equal

Raises an AssertionError if left and right are not equal. Provides an easy interface to ignore inequality in dtypes, indexes and precision among others.

testing.assert_frame_equal

Like assert_series_equal, but targets DataFrames.

numpy.array_equal

Return True if two arrays have the same shape and elements, False otherwise.

Examples

>>> df = pd.DataFrame({1: [10], 2: [20]})
>>> df
    1   2
0  10  20

DataFrames df and exactly_equal have the same types and values for their elements and column labels, which will return True.

>>> exactly_equal = pd.DataFrame({1: [10], 2: [20]})
>>> exactly_equal
    1   2
0  10  20
>>> df.equals(exactly_equal)
True

DataFrames df and different_column_type have the same element types and values, but have different types for the column labels, which will still return True.

>>> different_column_type = pd.DataFrame({1.0: [10], 2.0: [20]})
>>> different_column_type
   1.0  2.0
0   10   20
>>> df.equals(different_column_type)
True

DataFrames df and different_data_type have different types for the same values for their elements, and will return False even though their column labels are the same values and types.

>>> different_data_type = pd.DataFrame({1: [10.0], 2: [20.0]})
>>> different_data_type
      1     2
0  10.0  20.0
>>> df.equals(different_data_type)
False
estimate_utm_crs(datum_name='WGS 84')[source]

Return the estimated UTM CRS based on the bounds of the dataset.

Added in version 0.9.

Parameters:

datum_name (str, optional) – The name of the datum to use in the query. Default is WGS 84.

Return type:

pyproj.CRS

Examples

>>> import geodatasets
>>> df = geopandas.read_file(
...     geodatasets.get_path("geoda.chicago_health")
... )
>>> df.estimate_utm_crs()
<Derived Projected CRS: EPSG:32616>
Name: WGS 84 / UTM zone 16N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 90°W and 84°W, northern hemisphere between equator and 84°N...
- bounds: (-90.0, 0.0, -84.0, 84.0)
Coordinate Operation:
- name: UTM zone 16N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
eval(expr, *, inplace=False, **kwargs)

Evaluate a string describing operations on DataFrame columns.

Operates on columns only, not specific rows or elements. This allows eval to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.

Parameters:
  • expr (str) – The expression string to evaluate.

  • inplace (bool, default False) – If the expression contains an assignment, whether to perform the operation inplace and mutate the existing DataFrame. Otherwise, a new DataFrame is returned.

  • **kwargs – See the documentation for eval() for complete details on the keyword arguments accepted by query().

Returns:

The result of the evaluation or None if inplace=True.

Return type:

ndarray, scalar, pandas object, or None

See also

DataFrame.query

Evaluates a boolean expression to query the columns of a frame.

DataFrame.assign

Can evaluate an expression or function to create new values for a column.

eval

Evaluate a Python expression as a string using various backends.

Notes

For more details see the API documentation for eval(). For detailed examples see enhancing performance with eval.

Examples

>>> df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})
>>> df
   A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2
>>> df.eval('A + B')
0    11
1    10
2     9
3     8
4     7
dtype: int64

Assignment is allowed though by default the original DataFrame is not modified.

>>> df.eval('C = A + B')
   A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7
>>> df
   A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2

Multiple columns can be assigned to using multi-line expressions:

>>> df.eval(
...     '''
... C = A + B
... D = A - B
... '''
... )
   A   B   C  D
0  1  10  11 -9
1  2   8  10 -6
2  3   6   9 -3
3  4   4   8  0
4  5   2   7  3
ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=<no_default>, times=None, method='single')

Provide exponentially weighted (EW) calculations.

Exactly one of com, span, halflife, or alpha must be provided if times is not provided. If times is provided, halflife and one of com, span or alpha may be provided.

Parameters:
  • com (float, optional) –

    Specify decay in terms of center of mass

    \(\alpha = 1 / (1 + com)\), for \(com \geq 0\).

  • span (float, optional) –

    Specify decay in terms of span

    \(\alpha = 2 / (span + 1)\), for \(span \geq 1\).

  • halflife (float, str, timedelta, optional) –

    Specify decay in terms of half-life

    \(\alpha = 1 - \exp\left(-\ln(2) / halflife\right)\), for \(halflife > 0\).

    If times is specified, a timedelta convertible unit over which an observation decays to half its value. Only applicable to mean(), and halflife value will not apply to the other functions.

  • alpha (float, optional) –

    Specify smoothing factor \(\alpha\) directly

    \(0 < \alpha \leq 1\).

  • min_periods (int, default 0) – Minimum number of observations in window required to have a value; otherwise, result is np.nan.

  • adjust (bool, default True) –

    Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings (viewing EWMA as a moving average).

    • When adjust=True (default), the EW function is calculated using weights \(w_i = (1 - \alpha)^i\). For example, the EW moving average of the series [\(x_0, x_1, ..., x_t\)] would be:

    \[y_t = \frac{x_t + (1 - \alpha)x_{t-1} + (1 - \alpha)^2 x_{t-2} + ... + (1 - \alpha)^t x_0}{1 + (1 - \alpha) + (1 - \alpha)^2 + ... + (1 - \alpha)^t}\]
    • When adjust=False, the exponentially weighted function is calculated recursively:

    \[\begin{split}\begin{split} y_0 &= x_0\\ y_t &= (1 - \alpha) y_{t-1} + \alpha x_t, \end{split}\end{split}\]

  • ignore_na (bool, default False) –

    Ignore missing values when calculating weights.

    • When ignore_na=False (default), weights are based on absolute positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \((1-\alpha)^2\) and \(1\) if adjust=True, and \((1-\alpha)^2\) and \(\alpha\) if adjust=False.

    • When ignore_na=True, weights are based on relative positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \(1-\alpha\) and \(1\) if adjust=True, and \(1-\alpha\) and \(\alpha\) if adjust=False.

  • axis ({0, 1}, default 0) –

    If 0 or 'index', calculate across the rows.

    If 1 or 'columns', calculate across the columns.

    For Series this parameter is unused and defaults to 0.

  • times (np.ndarray, Series, default None) –

    Only applicable to mean().

    Times corresponding to the observations. Must be monotonically increasing and datetime64[ns] dtype.

    If 1-D array like, a sequence with the same shape as the observations.

  • method (str {'single', 'table'}, default 'single') –

    Added in version 1.4.0.

    Execute the rolling operation per single column or row ('single') or over the entire object ('table').

    This argument is only implemented when specifying engine='numba' in the method call.

    Only applicable to mean()

Return type:

pandas.api.typing.ExponentialMovingWindow

See also

rolling

Provides rolling window calculations.

expanding

Provides expanding transformations.

Notes

See Windowing Operations for further usage details and examples.

Examples

>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0
>>> df.ewm(com=0.5).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
>>> df.ewm(alpha=2 / 3).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213

adjust

>>> df.ewm(com=0.5, adjust=True).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
>>> df.ewm(com=0.5, adjust=False).mean()
          B
0  0.000000
1  0.666667
2  1.555556
3  1.555556
4  3.650794

ignore_na

>>> df.ewm(com=0.5, ignore_na=True).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.225000
>>> df.ewm(com=0.5, ignore_na=False).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213

times

Exponentially weighted mean with weights calculated with a timedelta halflife relative to times.

>>> times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
>>> df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean()
          B
0  0.000000
1  0.585786
2  1.523889
3  1.523889
4  3.233686
expanding(min_periods=1, axis=<no_default>, method='single')

Provide expanding window calculations.

Parameters:
  • min_periods (int, default 1) – Minimum number of observations in window required to have a value; otherwise, result is np.nan.

  • axis (int or str, default 0) –

    If 0 or 'index', roll across the rows.

    If 1 or 'columns', roll across the columns.

    For Series this parameter is unused and defaults to 0.

  • method (str {'single', 'table'}, default 'single') –

    Execute the rolling operation per single column or row ('single') or over the entire object ('table').

    This argument is only implemented when specifying engine='numba' in the method call.

    Added in version 1.3.0.

Return type:

pandas.api.typing.Expanding

See also

rolling

Provides rolling window calculations.

ewm

Provides exponential weighted functions.

Notes

See Windowing Operations for further usage details and examples.

Examples

>>> df = pd.DataFrame({"B": [0, 1, 2, np.nan, 4]})
>>> df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0

min_periods

Expanding sum with 1 vs 3 observations needed to calculate a value.

>>> df.expanding(1).sum()
     B
0  0.0
1  1.0
2  3.0
3  3.0
4  7.0
>>> df.expanding(3).sum()
     B
0  NaN
1  NaN
2  3.0
3  3.0
4  7.0
explode(column=None, ignore_index=False, index_parts=False, **kwargs)[source]

Explode multi-part geometries into multiple single geometries.

Each row containing a multi-part geometry will be split into multiple rows with single geometries, thereby increasing the vertical size of the GeoDataFrame.

Parameters:
  • column (string, default None) – Column to explode. In the case of a geometry column, multi-part geometries are converted to single-part. If None, the active geometry column is used.

  • ignore_index (bool, default False) – If True, the resulting index will be labelled 0, 1, …, n - 1, ignoring index_parts.

  • index_parts (boolean, default False) – If True, the resulting index will be a multi-index (original index with an additional level indicating the multiple geometries: a new zero-based index for each single part geometry per multi-part geometry).

Returns:

Exploded geodataframe with each single geometry as a separate entry in the geodataframe.

Return type:

GeoDataFrame

Examples

>>> from shapely.geometry import MultiPoint
>>> d = {
...     "col1": ["name1", "name2"],
...     "geometry": [
...         MultiPoint([(1, 2), (3, 4)]),
...         MultiPoint([(2, 1), (0, 0)]),
...     ],
... }
>>> gdf = geopandas.GeoDataFrame(d, crs=4326)
>>> gdf
    col1               geometry
0  name1  MULTIPOINT ((1 2), (3 4))
1  name2  MULTIPOINT ((2 1), (0 0))
>>> exploded = gdf.explode(index_parts=True)
>>> exploded
      col1     geometry
0 0  name1  POINT (1 2)
  1  name1  POINT (3 4)
1 0  name2  POINT (2 1)
  1  name2  POINT (0 0)
>>> exploded = gdf.explode(index_parts=False)
>>> exploded
    col1     geometry
0  name1  POINT (1 2)
0  name1  POINT (3 4)
1  name2  POINT (2 1)
1  name2  POINT (0 0)
>>> exploded = gdf.explode(ignore_index=True)
>>> exploded
    col1     geometry
0  name1  POINT (1 2)
1  name1  POINT (3 4)
2  name2  POINT (2 1)
3  name2  POINT (0 0)

See also

GeoDataFrame.dissolve

dissolve geometries into a single observation.

explore(*args, **kwargs)[source]

Explore data in interactive map based on GeoPandas and folium/leaflet.js.

Generate an interactive leaflet map based on GeoDataFrame

Parameters:
  • column (str, np.array, pd.Series (default None)) – The name of the dataframe column, numpy.array, or pandas.Series to be plotted. If numpy.array or pandas.Series are used then it must have same length as dataframe.

  • cmap (str, matplotlib.Colormap, branca.colormap or function (default None)) –

    The name of a colormap recognized by matplotlib, a list-like of colors, matplotlib.colors.Colormap, a branca.colormap.ColorMap or function that returns a named color or hex based on the column value, e.g.:

    def my_colormap(value):  # scalar value defined in 'column'
        if value > 1:
            return "green"
        return "red"
    

  • color (str, array-like (default None)) – Named color or a list-like of colors (named or hex).

  • m (folium.Map (default None)) – Existing map instance on which to draw the plot.

  • tiles (str, xyzservices.TileProvider (default 'OpenStreetMap Mapnik')) –

    Map tileset to use. Can choose from the list supported by folium, query a xyzservices.TileProvider by a name from xyzservices.providers, pass xyzservices.TileProvider object or pass custom XYZ URL. The current list of built-in providers (when xyzservices is not available):

    ["OpenStreetMap", "CartoDB positron", “CartoDB dark_matter"]

    You can pass a custom tileset to Folium by passing a Leaflet-style URL to the tiles parameter: http://{s}.yourtiles.com/{z}/{x}/{y}.png. Be sure to check their terms and conditions and to provide attribution with the attr keyword.

  • attr (str (default None)) – Map tile attribution; only required if passing custom tile URL.

  • tooltip (bool, str, int, list (default True)) – Display GeoDataFrame attributes when hovering over the object. True includes all columns. False removes tooltip. Pass string or list of strings to specify a column(s). Integer specifies first n columns to be included. Defaults to True.

  • popup (bool, str, int, list (default False)) – Input GeoDataFrame attributes for object displayed when clicking. True includes all columns. False removes popup. Pass string or list of strings to specify a column(s). Integer specifies first n columns to be included. Defaults to False.

  • highlight (bool (default True)) – Enable highlight functionality when hovering over a geometry.

  • categorical (bool (default False)) – If False, cmap will reflect numerical values of the column being plotted. For non-numerical columns, this will be set to True.

  • legend (bool (default True)) – Plot a legend in choropleth plots. Ignored if no column is given.

  • scheme (str (default None)) – Name of a choropleth classification scheme (requires mapclassify >= 2.4.0). A mapclassify.classify() will be used under the hood. Supported are all schemes provided by mapclassify (e.g. 'BoxPlot', 'EqualInterval', 'FisherJenks', 'FisherJenksSampled', 'HeadTailBreaks', 'JenksCaspall', 'JenksCaspallForced', 'JenksCaspallSampled', 'MaxP', 'MaximumBreaks', 'NaturalBreaks', 'Quantiles', 'Percentiles', 'StdMean', 'UserDefined'). Arguments can be passed in classification_kwds.

  • k (int (default 5)) – Number of classes

  • vmin (None or float (default None)) – Minimum value of cmap. If None, the minimum data value in the column to be plotted is used.

  • vmax (None or float (default None)) – Maximum value of cmap. If None, the maximum data value in the column to be plotted is used.

  • width (pixel int or percentage string (default: '100%')) – Width of the folium Map. If the argument m is given explicitly, width is ignored.

  • height (pixel int or percentage string (default: '100%')) – Height of the folium Map. If the argument m is given explicitly, height is ignored.

  • categories (list-like) – Ordered list-like object of categories to be used for categorical plot.

  • classification_kwds (dict (default None)) – Keyword arguments to pass to mapclassify

  • control_scale (bool, (default True)) – Whether to add a control scale on the map.

  • marker_type (str, folium.Circle, folium.CircleMarker, folium.Marker (default None)) – Allowed string options are (‘marker’, ‘circle’, ‘circle_marker’). Defaults to folium.CircleMarker.

  • marker_kwds (dict (default {})) –

    Additional keywords to be passed to the selected marker_type, e.g.:

    radiusfloat (default 2 for circle_marker and 50 for circle))

    Radius of the circle, in meters (for circle) or pixels (for circle_marker).

    fillbool (default True)

    Whether to fill the circle or circle_marker with color.

    iconfolium.map.Icon

    the folium.map.Icon object to use to render the marker.

    draggablebool (default False)

    Set to True to be able to drag the marker around the map.

  • style_kwds (dict (default {})) –

    Additional style to be passed to folium style_function:

    strokebool (default True)

    Whether to draw stroke along the path. Set it to False to disable borders on polygons or circles.

    colorstr

    Stroke color

    weightint

    Stroke width in pixels

    opacityfloat (default 1.0)

    Stroke opacity

    fillboolean (default True)

    Whether to fill the path with color. Set it to False to disable filling on polygons or circles.

    fillColorstr

    Fill color. Defaults to the value of the color option

    fillOpacityfloat (default 0.5)

    Fill opacity.

    style_functioncallable

    Function mapping a GeoJson Feature to a style dict.

    • Style properties folium.vector_layers.path_options()

    • GeoJson features GeoDataFrame.__geo_interface__

    e.g.:

    lambda x: {"color":"red" if x["properties"]["gdp_md_est"]<10**6
                                 else "blue"}
    

    Plus all supported by folium.vector_layers.path_options(). See the documentation of folium.features.GeoJson for details.

  • highlight_kwds (dict (default {})) – Style to be passed to folium highlight_function. Uses the same keywords as style_kwds. When empty, defaults to {"fillOpacity": 0.75}.

  • missing_kwds (dict (default {})) –

    Additional style for missing values:

    colorstr

    Color of missing values. Defaults to None, which uses Folium’s default.

    labelstr (default “NaN”)

    Legend entry for missing values.

  • tooltip_kwds (dict (default {})) – Additional keywords to be passed to folium.features.GeoJsonTooltip, e.g. aliases, labels, or sticky.

  • popup_kwds (dict (default {})) – Additional keywords to be passed to folium.features.GeoJsonPopup, e.g. aliases or labels.

  • legend_kwds (dict (default {})) –

    Additional keywords to be passed to the legend.

    Currently supported customisation:

    captionstring

    Custom caption of the legend. Defaults to the column name.

    Additional accepted keywords when scheme is specified:

    colorbarbool (default True)

    An option to control the style of the legend. If True, continuous colorbar will be used. If False, categorical legend will be used for bins.

    scalebool (default True)

    Scale bins along the colorbar axis according to the bin edges (True) or use the equal length for each bin (False)

    fmtstring (default “{:.2f}”)

    A formatting specification for the bin edges of the classes in the legend. For example, to have no decimals: {"fmt": "{:.0f}"}. Applies if colorbar=False.

    labelslist-like

    A list of legend labels to override the auto-generated labels. Needs to have the same number of elements as the number of classes (k). Applies if colorbar=False.

    intervalboolean (default False)

    An option to control brackets from mapclassify legend. If True, open/closed interval brackets are shown in the legend. Applies if colorbar=False.

    max_labelsint, default 10

    Maximum number of colorbar tick labels (requires branca>=0.5.0)

  • map_kwds (dict (default {})) – Additional keywords to be passed to folium Map, e.g. dragging, or scrollWheelZoom.

Return type:

folium.Map

**kwargsdict

Additional options to be passed on to the folium object.

Returns:

m – folium Map instance

Return type:

folium.folium.Map

Examples

>>> import geodatasets
>>> df = geopandas.read_file(
...     geodatasets.get_path("geoda.chicago_health")
... )
>>> df.head(2)
   ComAreaID  ...                                           geometry
0         35  ...  POLYGON ((-87.60914 41.84469, -87.60915 41.844...
1         36  ...  POLYGON ((-87.59215 41.81693, -87.59231 41.816...

[2 rows x 87 columns]

>>> df.explore("Pop2012", cmap="Blues")
property exterior

Return a GeoSeries of LinearRings representing the outer boundary of each polygon in the GeoSeries.

Applies to GeoSeries containing only Polygons. Returns None` for other geometry types.

Examples

>>> from shapely.geometry import Polygon, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         Polygon([(1, 0), (2, 1), (0, 0)]),
...         Point(0, 1)
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1    POLYGON ((1 0, 2 1, 0 0, 1 0))
2                       POINT (0 1)
dtype: geometry
>>> s.exterior
0    LINEARRING (0 0, 1 1, 0 1, 0 0)
1    LINEARRING (1 0, 2 1, 0 0, 1 0)
2                               None
dtype: geometry

See also

GeoSeries.boundary

complete set-theoretic boundary

GeoSeries.interiors

list of inner rings of each polygon

extract_unique_points()

Return a GeoSeries of MultiPoints representing all distinct vertices of an input geometry.

Examples

>>> from shapely import LineString, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         LineString([(0, 0), (0, 0), (1, 1), (1, 1)]),
...         Polygon([(0, 0), (0, 0), (1, 1), (1, 1)])
...     ],
... )
>>> s
0        LINESTRING (0 0, 0 0, 1 1, 1 1)
1    POLYGON ((0 0, 0 0, 1 1, 1 1, 0 0))
dtype: geometry
>>> s.extract_unique_points()
0    MULTIPOINT ((0 0), (1 1))
1    MULTIPOINT ((0 0), (1 1))
dtype: geometry

See also

GeoSeries.get_coordinates

extract coordinates as a DataFrame

ffill(*, axis=None, inplace=False, limit=None, limit_area=None, downcast=<no_default>)

Fill NA/NaN values by propagating the last valid observation to next valid.

Parameters:
  • axis ({0 or 'index'} for Series, {0 or 'index', 1 or 'columns'} for DataFrame) – Axis along which to fill missing values. For Series this parameter is unused and defaults to 0.

  • inplace (bool, default False) – If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).

  • limit (int, default None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

  • limit_area ({`None`, 'inside', 'outside'}, default None) –

    If limit is specified, consecutive NaNs will be filled with this restriction.

    • None: No fill restriction.

    • ’inside’: Only fill NaNs surrounded by valid values (interpolate).

    • ’outside’: Only fill NaNs outside valid values (extrapolate).

    Added in version 2.2.0.

  • downcast (dict, default is None) –

    A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

    Deprecated since version 2.2.0.

Returns:

Object with missing values filled or None if inplace=True.

Return type:

Series/DataFrame or None

Examples

>>> df = pd.DataFrame([[np.nan, 2, np.nan, 0],
...                    [3, 4, np.nan, 1],
...                    [np.nan, np.nan, np.nan, np.nan],
...                    [np.nan, 3, np.nan, 4]],
...                   columns=list("ABCD"))
>>> df
     A    B   C    D
0  NaN  2.0 NaN  0.0
1  3.0  4.0 NaN  1.0
2  NaN  NaN NaN  NaN
3  NaN  3.0 NaN  4.0
>>> df.ffill()
     A    B   C    D
0  NaN  2.0 NaN  0.0
1  3.0  4.0 NaN  1.0
2  3.0  4.0 NaN  1.0
3  3.0  3.0 NaN  4.0
>>> ser = pd.Series([1, np.nan, 2, 3])
>>> ser.ffill()
0   1.0
1   1.0
2   2.0
3   3.0
dtype: float64
fillna(value=None, *, method=None, axis=None, inplace=False, limit=None, downcast=<no_default>)

Fill NA/NaN values using the specified method.

Parameters:
  • value (scalar, dict, Series, or DataFrame) – Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.

  • method ({'backfill', 'bfill', 'ffill', None}, default None) –

    Method to use for filling holes in reindexed Series:

    • ffill: propagate last valid observation forward to next valid.

    • backfill / bfill: use next valid observation to fill gap.

    Deprecated since version 2.1.0: Use ffill or bfill instead.

  • axis ({0 or 'index'} for Series, {0 or 'index', 1 or 'columns'} for DataFrame) – Axis along which to fill missing values. For Series this parameter is unused and defaults to 0.

  • inplace (bool, default False) – If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).

  • limit (int, default None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

  • downcast (dict, default is None) –

    A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

    Deprecated since version 2.2.0.

Returns:

Object with missing values filled or None if inplace=True.

Return type:

Series/DataFrame or None

See also

ffill

Fill values by propagating the last valid observation to next valid.

bfill

Fill values by using the next valid observation to fill the gap.

interpolate

Fill NaN values using interpolation.

reindex

Conform object to new index.

asfreq

Convert TimeSeries to specified frequency.

Examples

>>> df = pd.DataFrame([[np.nan, 2, np.nan, 0],
...                    [3, 4, np.nan, 1],
...                    [np.nan, np.nan, np.nan, np.nan],
...                    [np.nan, 3, np.nan, 4]],
...                   columns=list("ABCD"))
>>> df
     A    B   C    D
0  NaN  2.0 NaN  0.0
1  3.0  4.0 NaN  1.0
2  NaN  NaN NaN  NaN
3  NaN  3.0 NaN  4.0

Replace all NaN elements with 0s.

>>> df.fillna(0)
     A    B    C    D
0  0.0  2.0  0.0  0.0
1  3.0  4.0  0.0  1.0
2  0.0  0.0  0.0  0.0
3  0.0  3.0  0.0  4.0

Replace all NaN elements in column ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1, 2, and 3 respectively.

>>> values = {"A": 0, "B": 1, "C": 2, "D": 3}
>>> df.fillna(value=values)
     A    B    C    D
0  0.0  2.0  2.0  0.0
1  3.0  4.0  2.0  1.0
2  0.0  1.0  2.0  3.0
3  0.0  3.0  2.0  4.0

Only replace the first NaN element.

>>> df.fillna(value=values, limit=1)
     A    B    C    D
0  0.0  2.0  2.0  0.0
1  3.0  4.0  NaN  1.0
2  NaN  1.0  NaN  3.0
3  NaN  3.0  NaN  4.0

When filling using a DataFrame, replacement happens along the same column names and same indices

>>> df2 = pd.DataFrame(np.zeros((4, 4)), columns=list("ABCE"))
>>> df.fillna(df2)
     A    B    C    D
0  0.0  2.0  0.0  0.0
1  3.0  4.0  0.0  1.0
2  0.0  0.0  0.0  NaN
3  0.0  3.0  0.0  4.0

Note that column D is not affected since it is not present in df2.

filter(items=None, like=None, regex=None, axis=None)

Subset the dataframe rows or columns according to the specified index labels.

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

Parameters:
  • items (list-like) – Keep labels from axis which are in items.

  • like (str) – Keep labels from axis for which “like in label == True”.

  • regex (str (regular expression)) – Keep labels from axis for which re.search(regex, label) == True.

  • axis ({0 or 'index', 1 or 'columns', None}, default None) – The axis to filter on, expressed either as an index (int) or axis name (str). By default this is the info axis, ‘columns’ for DataFrame. For Series this parameter is unused and defaults to None.

Return type:

same type as input object

See also

DataFrame.loc

Access a group of rows and columns by label(s) or a boolean array.

Notes

The items, like, and regex parameters are enforced to be mutually exclusive.

axis defaults to the info axis that is used when indexing with [].

Examples

>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
...                   index=['mouse', 'rabbit'],
...                   columns=['one', 'two', 'three'])
>>> df
        one  two  three
mouse     1    2      3
rabbit    4    5      6
>>> # select columns by name
>>> df.filter(items=['one', 'three'])
         one  three
mouse     1      3
rabbit    4      6
>>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1)
         one  three
mouse     1      3
rabbit    4      6
>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0)
         one  two  three
rabbit    4    5      6
first(offset)

Select initial periods of time series data based on a date offset.

Deprecated since version 2.1: first() is deprecated and will be removed in a future version. Please create a mask and filter using .loc instead.

For a DataFrame with a sorted DatetimeIndex, this function can select the first few rows based on a date offset.

Parameters:

offset (str, DateOffset or dateutil.relativedelta) – The offset length of the data that will be selected. For instance, ‘1ME’ will display all the rows having their index within the first month.

Returns:

A subset of the caller.

Return type:

Series or DataFrame

Raises:

TypeError – If the index is not a DatetimeIndex

See also

last

Select final periods of time series based on a date offset.

at_time

Select values at a particular time of the day.

between_time

Select values between particular times of the day.

Examples

>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
            A
2018-04-09  1
2018-04-11  2
2018-04-13  3
2018-04-15  4

Get the rows for the first 3 days:

>>> ts.first('3D')
            A
2018-04-09  1
2018-04-11  2

Notice the data for 3 first calendar days were returned, not the first 3 days observed in the dataset, and therefore data for 2018-04-13 was not returned.

first_valid_index()

Return index for first non-NA value or None, if no non-NA value is found.

Return type:

type of index

Examples

For Series:

>>> s = pd.Series([None, 3, 4])
>>> s.first_valid_index()
1
>>> s.last_valid_index()
2
>>> s = pd.Series([None, None])
>>> print(s.first_valid_index())
None
>>> print(s.last_valid_index())
None

If all elements in Series are NA/null, returns None.

>>> s = pd.Series()
>>> print(s.first_valid_index())
None
>>> print(s.last_valid_index())
None

If Series is empty, returns None.

For DataFrame:

>>> df = pd.DataFrame({'A': [None, None, 2], 'B': [None, 3, 4]})
>>> df
     A      B
0  NaN    NaN
1  NaN    3.0
2  2.0    4.0
>>> df.first_valid_index()
1
>>> df.last_valid_index()
2
>>> df = pd.DataFrame({'A': [None, None, None], 'B': [None, None, None]})
>>> df
     A      B
0  None   None
1  None   None
2  None   None
>>> print(df.first_valid_index())
None
>>> print(df.last_valid_index())
None

If all elements in DataFrame are NA/null, returns None.

>>> df = pd.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> print(df.first_valid_index())
None
>>> print(df.last_valid_index())
None

If DataFrame is empty, returns None.

property flags: Flags

Get the properties associated with this pandas object.

The available flags are

  • Flags.allows_duplicate_labels

See also

Flags

Flags that apply to pandas objects.

DataFrame.attrs

Global metadata applying to this dataset.

Notes

“Flags” differ from “metadata”. Flags reflect properties of the pandas object (the Series or DataFrame). Metadata refer to properties of the dataset, and should be stored in DataFrame.attrs.

Examples

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags
<Flags(allows_duplicate_labels=True)>

Flags can be get or set using .

>>> df.flags.allows_duplicate_labels
True
>>> df.flags.allows_duplicate_labels = False

Or by slicing with a key

>>> df.flags["allows_duplicate_labels"]
False
>>> df.flags["allows_duplicate_labels"] = True
floordiv(other, axis='columns', level=None, fill_value=None)

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

Equivalent to dataframe // other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
force_2d()

Force the dimensionality of a geometry to 2D.

Removes the additional Z coordinate dimension from all geometries.

Return type:

GeoSeries

Examples

>>> from shapely import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Point(0.5, 2.5, 0),
...         LineString([(1, 1, 1), (0, 1, 3), (1, 0, 2)]),
...         Polygon([(0, 0, 0), (0, 10, 0), (10, 10, 0)]),
...     ],
... )
>>> s
0                            POINT Z (0.5 2.5 0)
1             LINESTRING Z (1 1 1, 0 1 3, 1 0 2)
2    POLYGON Z ((0 0 0, 0 10 0, 10 10 0, 0 0 0))
dtype: geometry
>>> s.force_2d()
0                      POINT (0.5 2.5)
1           LINESTRING (1 1, 0 1, 1 0)
2    POLYGON ((0 0, 0 10, 10 10, 0 0))
dtype: geometry
force_3d(z=0)

Force the dimensionality of a geometry to 3D.

2D geometries will get the provided Z coordinate; 3D geometries are unchanged (unless their Z coordinate is np.nan).

Note that for empty geometries, 3D is only supported since GEOS 3.9 and then still only for simple geometries (non-collections).

Parameters:

z (float | array_like (default 0)) – Z coordinate to be assigned

Return type:

GeoSeries

Examples

>>> from shapely import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Point(1, 2),
...         Point(0.5, 2.5, 2),
...         LineString([(1, 1), (0, 1), (1, 0)]),
...         Polygon([(0, 0), (0, 10), (10, 10)]),
...     ],
... )
>>> s
0                          POINT (1 2)
1                  POINT Z (0.5 2.5 2)
2           LINESTRING (1 1, 0 1, 1 0)
3    POLYGON ((0 0, 0 10, 10 10, 0 0))
dtype: geometry
>>> s.force_3d()
0                                POINT Z (1 2 0)
1                            POINT Z (0.5 2.5 2)
2             LINESTRING Z (1 1 0, 0 1 0, 1 0 0)
3    POLYGON Z ((0 0 0, 0 10 0, 10 10 0, 0 0 0))
dtype: geometry

Z coordinate can be specified as scalar:

>>> s.force_3d(4)
0                                POINT Z (1 2 4)
1                            POINT Z (0.5 2.5 2)
2             LINESTRING Z (1 1 4, 0 1 4, 1 0 4)
3    POLYGON Z ((0 0 4, 0 10 4, 10 10 4, 0 0 4))
dtype: geometry

Or as an array-like (one value per geometry):

>>> s.force_3d(range(4))
0                                POINT Z (1 2 0)
1                            POINT Z (0.5 2.5 2)
2             LINESTRING Z (1 1 2, 0 1 2, 1 0 2)
3    POLYGON Z ((0 0 3, 0 10 3, 10 10 3, 0 0 3))
dtype: geometry
frechet_distance(other, align=None, densify=None)

Return a Series containing the Frechet distance to aligned other.

The Fréchet distance is a measure of similarity: it is the greatest distance between any point in A and the closest point in B. The discrete distance is an approximation of this metric: only vertices are considered. The parameter densify makes this approximation less coarse by splitting the line segments between vertices before computing the distance.

Fréchet distance sweep continuously along their respective curves and the direction of curves is significant. This makes it a better measure of similarity than Hausdorff distance for curve or surface matching.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The Geoseries (elementwise) or geometric object to find the distance to.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

  • densify (float (default None)) – A value between 0 and 1, that splits each subsegment of a line string into equal length segments, making the approximation less coarse. A densify value of 0.5 will add a point halfway between each pair of points. A densify value of 0.25 will add a point every quarter of the way between each pair of points.

Return type:

Series (float)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 0), (1, 1)]),
...         Polygon([(0, 0), (-1, 0), (-1, 1)]),
...         LineString([(1, 1), (0, 0)]),
...         Point(0, 0),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Point(3, 1),
...         LineString([(1, 0), (2, 0)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0      POLYGON ((0 0, 1 0, 1 1, 0 0))
1    POLYGON ((0 0, -1 0, -1 1, 0 0))
2               LINESTRING (1 1, 0 0)
3                         POINT (0 0)
dtype: geometry
>>> s2
1    POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ...
2                                          POINT (3 1)
3                                LINESTRING (1 0, 2 0)
4                                          POINT (0 1)
dtype: geometry

We can check the frechet distance of each geometry of GeoSeries to a single geometry:

>>> point = Point(-1, 0)
>>> s.frechet_distance(point)
0    2.236068
1    1.000000
2    2.236068
3    1.000000
dtype: float64

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and use elements with the same index using align=True or ignore index and use elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.frechet_distance(s2, align=True)
0         NaN
1    2.121320
2    3.162278
3    2.000000
4         NaN
dtype: float64
>>> s.frechet_distance(s2, align=False)
0    0.707107
1    4.123106
2    2.000000
3    1.000000
dtype: float64

We can also set a densify value, which is a float between 0 and 1 and signifies the fraction of the distance between each pair of points that will be used as the distance between the points when densifying.

>>> l1 = geopandas.GeoSeries([LineString([(0, 0), (10, 0), (0, 15)])])
>>> l2 = geopandas.GeoSeries([LineString([(0, 0), (20, 15), (9, 11)])])
>>> l1.frechet_distance(l2)
0    18.027756
dtype: float64
>>> l1.frechet_distance(l2, densify=0.25)
0    16.77051
dtype: float64
classmethod from_arrow(table, geometry=None, to_pandas_kwargs=None)[source]

Construct a GeoDataFrame from a Arrow table object based on GeoArrow extension types.

See https://geoarrow.org/ for details on the GeoArrow specification.

This functions accepts any tabular Arrow object implementing the Arrow PyCapsule Protocol (i.e. having an __arrow_c_array__ or __arrow_c_stream__ method).

Added in version 1.0.

Parameters:
  • table (pyarrow.Table or Arrow-compatible table) – Any tabular object implementing the Arrow PyCapsule Protocol (i.e. has an __arrow_c_array__ or __arrow_c_stream__ method). This table should have at least one column with a geoarrow geometry type.

  • geometry (str, default None) – The name of the geometry column to set as the active geometry column. If None, the first geometry column found will be used.

  • to_pandas_kwargs (dict, optional) – Arguments passed to the pa.Table.to_pandas method for non-geometry columns. This can be used to control the behavior of the conversion of the non-geometry columns to a pandas DataFrame. For example, you can use this to control the dtype conversion of the columns. By default, the to_pandas method is called with no additional arguments.

Return type:

GeoDataFrame

classmethod from_dict(data, geometry=None, crs=None, **kwargs)[source]

Construct GeoDataFrame from dict of array-like or dicts by overriding DataFrame.from_dict method with geometry and crs.

Parameters:
  • data (dict) – Of the form {field : array-like} or {field : dict}.

  • geometry (str or array (optional)) – If str, column to use as geometry. If array, will be set as ‘geometry’ column on GeoDataFrame.

  • crs (str or dict (optional)) – Coordinate reference system to set on the resulting frame.

  • kwargs (key-word arguments) – These arguments are passed to DataFrame.from_dict

Return type:

GeoDataFrame

classmethod from_features(features, crs=None, columns=None)[source]

Alternate constructor to create GeoDataFrame from an iterable of features or a feature collection.

Parameters:
  • features

    • Iterable of features, where each element must be a feature dictionary or implement the __geo_interface__.

    • Feature collection, where the ‘features’ key contains an iterable of features.

    • Object holding a feature collection that implements the __geo_interface__.

  • crs (str or dict (optional)) – Coordinate reference system to set on the resulting frame.

  • columns (list of column names, optional) – Optionally specify the column names to include in the output frame. This does not overwrite the property names of the input, but can ensure a consistent output format.

Return type:

GeoDataFrame

Notes

For more information about the __geo_interface__, see https://gist.github.com/sgillies/2217756

Examples

>>> feature_coll = {
...     "type": "FeatureCollection",
...     "features": [
...         {
...             "id": "0",
...             "type": "Feature",
...             "properties": {"col1": "name1"},
...             "geometry": {"type": "Point", "coordinates": (1.0, 2.0)},
...             "bbox": (1.0, 2.0, 1.0, 2.0),
...         },
...         {
...             "id": "1",
...             "type": "Feature",
...             "properties": {"col1": "name2"},
...             "geometry": {"type": "Point", "coordinates": (2.0, 1.0)},
...             "bbox": (2.0, 1.0, 2.0, 1.0),
...         },
...     ],
...     "bbox": (1.0, 1.0, 2.0, 2.0),
... }
>>> df = geopandas.GeoDataFrame.from_features(feature_coll)
>>> df
      geometry   col1
0  POINT (1 2)  name1
1  POINT (2 1)  name2
classmethod from_file(filename, **kwargs)[source]

Alternate constructor to create a GeoDataFrame from a file.

It is recommended to use geopandas.read_file() instead.

Can load a GeoDataFrame from a file in any format recognized by pyogrio. See http://pyogrio.readthedocs.io/ for details.

Parameters:
  • filename (str) – File path or file handle to read from. Depending on which kwargs are included, the content of filename may vary. See pyogrio.read_dataframe() for usage details.

  • kwargs (key-word arguments) – These arguments are passed to pyogrio.read_dataframe(), and can be used to access multi-layer data, data stored within archives (zip files), etc.

Return type:

GeoDataFrame

Examples

>>> import geodatasets
>>> path = geodatasets.get_path('nybb')
>>> gdf = geopandas.GeoDataFrame.from_file(path)
>>> gdf
   BoroCode       BoroName     Shape_Leng    Shape_Area                                           geometry
0         5  Staten Island  330470.010332  1.623820e+09  MULTIPOLYGON (((970217.022 145643.332, 970227....
1         4         Queens  896344.047763  3.045213e+09  MULTIPOLYGON (((1029606.077 156073.814, 102957...
2         3       Brooklyn  741080.523166  1.937479e+09  MULTIPOLYGON (((1021176.479 151374.797, 102100...
3         1      Manhattan  359299.096471  6.364715e+08  MULTIPOLYGON (((981219.056 188655.316, 980940....
4         2          Bronx  464392.991824  1.186925e+09  MULTIPOLYGON (((1012821.806 229228.265, 101278...

The recommended method of reading files is geopandas.read_file():

>>> gdf = geopandas.read_file(path)

See also

read_file

read file to GeoDataFrame

GeoDataFrame.to_file

write GeoDataFrame to file

classmethod from_postgis(sql, con, geom_col='geom', crs=None, index_col=None, coerce_float=True, parse_dates=None, params=None, chunksize=None)[source]

Alternate constructor to create a GeoDataFrame from a sql query containing a geometry column in WKB representation.

Parameters:
  • sql (string)

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine)

  • geom_col (string, default 'geom') – column name to convert to shapely geometries

  • crs (optional) – Coordinate reference system to use for the returned GeoDataFrame

  • index_col (string or list of strings, optional, default: None) – Column(s) to set as index(MultiIndex)

  • coerce_float (boolean, default True) – Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

  • parse_dates (list or dict, default None) –

    • List of column names to parse as dates.

    • Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

    • Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime(). Especially useful with databases without native Datetime support, such as SQLite.

  • params (list, tuple or dict, optional, default None) – List of parameters to pass to execute method.

  • chunksize (int, default None) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.

Return type:

GeoDataFrame

Examples

PostGIS

>>> from sqlalchemy import create_engine
>>> db_connection_url = "postgresql://myusername:mypassword@myhost:5432/mydb"
>>> con = create_engine(db_connection_url)
>>> sql = "SELECT geom, highway FROM roads"
>>> df = geopandas.GeoDataFrame.from_postgis(sql, con)

SpatiaLite

>>> sql = "SELECT ST_Binary(geom) AS geom, highway FROM roads"
>>> df = geopandas.GeoDataFrame.from_postgis(sql, con)

The recommended method of reading from PostGIS is geopandas.read_postgis():

>>> df = geopandas.read_postgis(sql, con)

See also

geopandas.read_postgis

read PostGIS database to GeoDataFrame

classmethod from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)

Convert structured or record ndarray to DataFrame.

Creates a DataFrame object from a structured ndarray, sequence of tuples or dicts, or DataFrame.

Parameters:
  • data (structured ndarray, sequence of tuples or dicts, or DataFrame) –

    Structured input data.

    Deprecated since version 2.1.0: Passing a DataFrame is deprecated.

  • index (str, list of fields, array-like) – Field of array to use as the index, alternately a specific set of input labels to use.

  • exclude (sequence, default None) – Columns or fields to exclude.

  • columns (sequence, default None) – Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns).

  • coerce_float (bool, default False) – Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.

  • nrows (int, default None) – Number of rows to read if data is an iterator.

Return type:

DataFrame

See also

DataFrame.from_dict

DataFrame from dict of array-like or dicts.

DataFrame

DataFrame object creation using constructor.

Examples

Data can be provided as a structured ndarray:

>>> data = np.array([(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')],
...                 dtype=[('col_1', 'i4'), ('col_2', 'U1')])
>>> pd.DataFrame.from_records(data)
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Data can be provided as a list of dicts:

>>> data = [{'col_1': 3, 'col_2': 'a'},
...         {'col_1': 2, 'col_2': 'b'},
...         {'col_1': 1, 'col_2': 'c'},
...         {'col_1': 0, 'col_2': 'd'}]
>>> pd.DataFrame.from_records(data)
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Data can be provided as a list of tuples with corresponding columns:

>>> data = [(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')]
>>> pd.DataFrame.from_records(data, columns=['col_1', 'col_2'])
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d
ge(other, axis='columns', level=None)

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
geom_equals(other, align=None)

Return a Series of dtype('bool') with value True for each aligned geometry equal to other.

An object is said to be equal to other if its set-theoretic boundary, interior, and exterior coincides with those of the other.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test for equality.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (1, 2), (0, 2)]),
...         LineString([(0, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (1, 2), (0, 2)]),
...         Point(0, 1),
...         LineString([(0, 0), (0, 2)]),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 1 2, 0 2, 0 0))
2             LINESTRING (0 0, 0 2)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2    POLYGON ((0 0, 1 2, 0 2, 0 0))
3                       POINT (0 1)
4             LINESTRING (0 0, 0 2)
dtype: geometry

We can check if each geometry of GeoSeries contains a single geometry:

../../../_static/binary_op-03.svg
>>> polygon = Polygon([(0, 0), (2, 2), (0, 2)])
>>> s.geom_equals(polygon)
0     True
1    False
2    False
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.geom_equals(s2)
0    False
1    False
2    False
3     True
4    False
dtype: bool
>>> s.geom_equals(s2, align=False)
0     True
1     True
2    False
3    False
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is equal to any element of the other one.

See also

GeoSeries.geom_equals_exact, GeoSeries.geom_equals_identical

geom_equals_exact(other, tolerance, align=None)

Return True for all geometries that equal aligned other to a given tolerance, else False.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to compare to.

  • tolerance (float) – Decimal place precision used when testing for approximate equality.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Point
>>> s = geopandas.GeoSeries(
...     [
...         Point(0, 1.1),
...         Point(0, 1.0),
...         Point(0, 1.2),
...     ]
... )
>>> s
0    POINT (0 1.1)
1      POINT (0 1)
2    POINT (0 1.2)
dtype: geometry
>>> s.geom_equals_exact(Point(0, 1), tolerance=0.1)
0    False
1     True
2    False
dtype: bool
>>> s.geom_equals_exact(Point(0, 1), tolerance=0.15)
0     True
1     True
2    False
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is equal to any element of the other one.

See also

GeoSeries.geom_equals, GeoSeries.geom_equals_identical

geom_equals_identical(other, align=None)

Return True for all geometries that are identical aligned other, else False.

This function verifies whether geometries are pointwise equivalent by checking that the structure, ordering, and values of all vertices are identical in all dimensions.

Similarly to geom_equals_exact(), this function uses exact coordinate equality and requires coordinates to be in the same order for all components (vertices, rings, or parts) of a geometry. However, in contrast to geom_equals_exact(), this function does not allow specifying specify a tolerance, and additionally requires all dimensions to be the same (geom_equals_exact() ignores the Z and M dimensions), where NaN values are considered to be equal to other NaN values.

This function is the vectorized equivalent of scalar equality of geometry objects (a == b, i.e. __eq__).

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg

Requires Shapely >= 2.1.

Added in version 1.1.0.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to compare to.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Point
>>> s = geopandas.GeoSeries(
...     [
...         Point(0, 1.1),
...         Point(0, 1.0),
...         Point(0, 1.2),
...     ]
... )
>>> s
0    POINT (0 1.1)
1      POINT (0 1)
2    POINT (0 1.2)
dtype: geometry
>>> s.geom_equals_identical(Point(0, 1))
0    False
1     True
2    False
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is equal to any element of the other one.

See also

GeoSeries.geom_equals, GeoSeries.geom_equals_exact

property geom_type

Returns a Series of strings specifying the Geometry Type of each object.

Examples

>>> from shapely.geometry import Point, Polygon, LineString
>>> d = {'geometry': [Point(2, 1), Polygon([(0, 0), (1, 1), (1, 0)]),
... LineString([(0, 0), (1, 1)])]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf.geom_type
0         Point
1       Polygon
2    LineString
dtype: object
property geometry: GeoSeries

Geometry data for GeoDataFrame

get(key, default=None)

Get item from object for given key (ex: DataFrame column).

Returns default value if not found.

Parameters:

key (object)

Return type:

same type as items contained in object

Examples

>>> df = pd.DataFrame(
...     [
...         [24.3, 75.7, "high"],
...         [31, 87.8, "high"],
...         [22, 71.6, "medium"],
...         [35, 95, "medium"],
...     ],
...     columns=["temp_celsius", "temp_fahrenheit", "windspeed"],
...     index=pd.date_range(start="2014-02-12", end="2014-02-15", freq="D"),
... )
>>> df
            temp_celsius  temp_fahrenheit windspeed
2014-02-12          24.3             75.7      high
2014-02-13          31.0             87.8      high
2014-02-14          22.0             71.6    medium
2014-02-15          35.0             95.0    medium
>>> df.get(["temp_celsius", "windspeed"])
            temp_celsius windspeed
2014-02-12          24.3      high
2014-02-13          31.0      high
2014-02-14          22.0    medium
2014-02-15          35.0    medium
>>> ser = df['windspeed']
>>> ser.get('2014-02-13')
'high'

If the key isn’t found, the default value will be used.

>>> df.get(["temp_celsius", "temp_kelvin"], default="default_value")
'default_value'
>>> ser.get('2014-02-10', '[unknown]')
'[unknown]'
get_coordinates(include_z=False, ignore_index=False, index_parts=False, *, include_m=False)

Get coordinates from a GeoSeries as a DataFrame of floats.

The shape of the returned DataFrame is (N, 2), with N being the number of coordinate pairs. With the default of include_z=False, three-dimensional data is ignored. When specifying include_z=True, the shape of the returned DataFrame is (N, 3).

Parameters:
  • include_z (bool, default False) – Include Z coordinates

  • ignore_index (bool, default False) – If True, the resulting index will be labelled 0, 1, …, n - 1, ignoring index_parts.

  • index_parts (bool, default False) – If True, the resulting index will be a MultiIndex (original index with an additional level indicating the ordering of the coordinate pairs: a new zero-based index for each geometry in the original GeoSeries).

  • include_m (bool, default False) – Include M coordinates. Requires shapely >= 2.1.

Return type:

pandas.DataFrame

Examples

>>> from shapely.geometry import Point, LineString, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Point(1, 1),
...         LineString([(1, -1), (1, 0)]),
...         Polygon([(3, -1), (4, 0), (3, 1)]),
...     ]
... )
>>> s
0                         POINT (1 1)
1              LINESTRING (1 -1, 1 0)
2    POLYGON ((3 -1, 4 0, 3 1, 3 -1))
dtype: geometry
>>> s.get_coordinates()
     x    y
0  1.0  1.0
1  1.0 -1.0
1  1.0  0.0
2  3.0 -1.0
2  4.0  0.0
2  3.0  1.0
2  3.0 -1.0
>>> s.get_coordinates(ignore_index=True)
     x    y
0  1.0  1.0
1  1.0 -1.0
2  1.0  0.0
3  3.0 -1.0
4  4.0  0.0
5  3.0  1.0
6  3.0 -1.0
>>> s.get_coordinates(index_parts=True)
       x    y
0 0  1.0  1.0
1 0  1.0 -1.0
  1  1.0  0.0
2 0  3.0 -1.0
  1  4.0  0.0
  2  3.0  1.0
  3  3.0 -1.0
get_geometry(index)

Return the n-th geometry from a collection of geometries.

Parameters:

index (int or array_like) – Position of a geometry to be retrieved within its collection

Return type:

GeoSeries

Notes

Simple geometries act as collections of length 1. Any out-of-range index value returns None.

Examples

>>> from shapely.geometry import Point, MultiPoint, GeometryCollection
>>> s = geopandas.GeoSeries(
...     [
...         Point(0, 0),
...         MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0)]),
...         GeometryCollection(
...             [MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0)]), Point(0, 1)]
...         ),
...     ]
... )
>>> s
0                                          POINT (0 0)
1              MULTIPOINT ((0 0), (1 1), (0 1), (1 0))
2    GEOMETRYCOLLECTION (MULTIPOINT ((0 0), (1 1), ...
dtype: geometry
>>> s.get_geometry(0)
0                                POINT (0 0)
1                                POINT (0 0)
2    MULTIPOINT ((0 0), (1 1), (0 1), (1 0))
dtype: geometry
>>> s.get_geometry(1)
0           None
1    POINT (1 1)
2    POINT (0 1)
dtype: geometry
>>> s.get_geometry(-1)
0    POINT (0 0)
1    POINT (1 0)
2    POINT (0 1)
dtype: geometry
get_precision()

Return a Series of the precision of each geometry.

If a precision has not been previously set, it will be 0, indicating regular double precision coordinates are in use. Otherwise, it will return the precision grid size that was set on a geometry.

Returns NaN for not-a-geometry values.

Examples

>>> from shapely.geometry import Point
>>> s = geopandas.GeoSeries(
...     [
...         Point(0, 1),
...         Point(0, 1, 2),
...         Point(0, 1.5, 2),
...     ]
... )
>>> s
0          POINT (0 1)
1      POINT Z (0 1 2)
2    POINT Z (0 1.5 2)
dtype: geometry
>>> s.get_precision()
0    0.0
1    0.0
2    0.0
dtype: float64
>>> s1 = s.set_precision(1)
>>> s1
0        POINT (0 1)
1    POINT Z (0 1 2)
2    POINT Z (0 2 2)
dtype: geometry
>>> s1.get_precision()
0    1.0
1    1.0
2    1.0
dtype: float64

See also

GeoSeries.set_precision

set precision grid size

groupby(by=None, axis=<no_default>, level=None, as_index=True, sort=True, group_keys=True, observed=<no_default>, dropna=True)

Group DataFrame using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:
  • by (mapping, function, label, pd.Grouper or list of such) – Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    Split along rows (0) or columns (1). For Series this parameter is unused and defaults to 0.

    Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For axis=1, do frame.T.groupby(...) instead.

  • level (int, level name, or sequence of such, default None) – If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.

  • as_index (bool, default True) – Return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

  • sort (bool, default True) –

    Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original DataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

    Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values.

  • group_keys (bool, default True) –

    When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.

    Changed in version 1.5.0: Warns that group_keys will no longer be ignored when the result from apply is a like-indexed Series or DataFrame. Specify group_keys explicitly to include the group keys or not.

    Changed in version 2.0.0: group_keys now defaults to True.

  • observed (bool, default False) –

    This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

    Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.

  • dropna (bool, default True) – If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:

Returns a groupby object that contains information about the groups.

Return type:

pandas.api.typing.DataFrameGroupBy

See also

resample

Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.

Examples

>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
...                               'Parrot', 'Parrot'],
...                    'Max Speed': [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(['Animal']).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

Hierarchical Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Captive', 'Wild', 'Captive', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]},
...                   index=index)
>>> df
                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
Parrot Captive       30.0
       Wild          20.0
>>> df.groupby(level=0).mean()
        Max Speed
Animal
Falcon      370.0
Parrot       25.0
>>> df.groupby(level="Type").mean()
         Max Speed
Type
Captive      210.0
Wild         185.0

We can also choose to include NA in group keys or not by setting dropna parameter, the default setting is True.

>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by=["b"]).sum()
    a   c
b
1.0 2   3
2.0 2   5
>>> df.groupby(by=["b"], dropna=False).sum()
    a   c
b
1.0 2   3
2.0 2   5
NaN 1   4
>>> l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by="a").sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0
>>> df.groupby(by="a", dropna=False).sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0
NaN 12.3   33.0

When using .apply(), use group_keys to include or exclude the group keys. The group_keys argument defaults to True (include).

>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
...                               'Parrot', 'Parrot'],
...                    'Max Speed': [380., 370., 24., 26.]})
>>> df.groupby("Animal", group_keys=True)[['Max Speed']].apply(lambda x: x)
          Max Speed
Animal
Falcon 0      380.0
       1      370.0
Parrot 2       24.0
       3       26.0
>>> df.groupby("Animal", group_keys=False)[['Max Speed']].apply(lambda x: x)
   Max Speed
0      380.0
1      370.0
2       24.0
3       26.0
gt(other, axis='columns', level=None)

Get Greater than of dataframe and other, element-wise (binary operator gt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
property has_m

Return a Series of dtype('bool') with value True for features that have a m-component.

Requires Shapely >= 2.1.

Added in version 1.1.0.

Examples

>>> from shapely.geometry import Point
>>> s = geopandas.GeoSeries.from_wkt(
...     [
...         "POINT M (2 3 5)",
...         "POINT Z (1 2 3)",
...         "POINT (0 0)",
...     ]
... )
>>> s
0    POINT M (2 3 5)
1    POINT Z (1 2 3)
2        POINT (0 0)
dtype: geometry
>>> s.has_m
0     True
1    False
2    False
dtype: bool
property has_sindex

Check the existence of the spatial index without generating it.

Use the .sindex attribute on a GeoDataFrame or GeoSeries to generate a spatial index if it does not yet exist, which may take considerable time based on the underlying index implementation.

Note that the underlying spatial index may not be fully initialized until the first use.

Examples

>>> from shapely.geometry import Point
>>> d = {'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = geopandas.GeoDataFrame(d)
>>> gdf.has_sindex
False
>>> index = gdf.sindex
>>> gdf.has_sindex
True
Returns:

True if the spatial index has been generated or False if not.

Return type:

bool

property has_z

Return a Series of dtype('bool') with value True for features that have a z-component.

Notes

Every operation in GeoPandas is planar, i.e. the potential third dimension is not taken into account.

Examples

>>> from shapely.geometry import Point
>>> s = geopandas.GeoSeries(
...     [
...         Point(0, 1),
...         Point(0, 1, 2),
...     ]
... )
>>> s
0        POINT (0 1)
1    POINT Z (0 1 2)
dtype: geometry
>>> s.has_z
0    False
1     True
dtype: bool
hausdorff_distance(other, align=None, densify=None)

Return a Series containing the Hausdorff distance to aligned other.

The Hausdorff distance is the largest distance consisting of any point in self with the nearest point in other.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The Geoseries (elementwise) or geometric object to find the distance to.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

  • densify (float (default None)) – A value between 0 and 1, that splits each subsegment of a line string into equal length segments, making the approximation less coarse. A densify value of 0.5 will add a point halfway between each pair of points. A densify value of 0.25 will add a point a quarter of the way between each pair of points.

Return type:

Series (float)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 0), (1, 1)]),
...         Polygon([(0, 0), (-1, 0), (-1, 1)]),
...         LineString([(1, 1), (0, 0)]),
...         Point(0, 0),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Point(3, 1),
...         LineString([(1, 0), (2, 0)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0      POLYGON ((0 0, 1 0, 1 1, 0 0))
1    POLYGON ((0 0, -1 0, -1 1, 0 0))
2               LINESTRING (1 1, 0 0)
3                         POINT (0 0)
dtype: geometry
>>> s2
1    POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ...
2                                          POINT (3 1)
3                                LINESTRING (1 0, 2 0)
4                                          POINT (0 1)
dtype: geometry

We can check the hausdorff distance of each geometry of GeoSeries to a single geometry:

>>> point = Point(-1, 0)
>>> s.hausdorff_distance(point)
0    2.236068
1    1.000000
2    2.236068
3    1.000000
dtype: float64

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and use elements with the same index using align=True or ignore index and use elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.hausdorff_distance(s2, align=True)
0         NaN
1    2.121320
2    3.162278
3    2.000000
4         NaN
dtype: float64
>>> s.hausdorff_distance(s2, align=False)
0    0.707107
1    4.123106
2    1.414214
3    1.000000
dtype: float64

We can also set a densify value, which is a float between 0 and 1 and signifies the fraction of the distance between each pair of points that will be used as the distance between the points when densifying.

>>> l1 = geopandas.GeoSeries([LineString([(130, 0), (0, 0), (0, 150)])])
>>> l2 = geopandas.GeoSeries([LineString([(10, 10), (10, 150), (130, 10)])])
>>> l1.hausdorff_distance(l2)
0    14.142136
dtype: float64
>>> l1.hausdorff_distance(l2, densify=0.25)
0    70.0
dtype: float64
head(n=5)

Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

For negative values of n, this function returns all rows except the last |n| rows, equivalent to df[:n].

If n is larger than the number of rows, this function returns all rows.

Parameters:

n (int, default 5) – Number of rows to select.

Returns:

The first n rows of the caller object.

Return type:

same type as caller

See also

DataFrame.tail

Returns the last n rows.

Examples

>>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the first 5 lines

>>> df.head()
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey

Viewing the first n lines (three in this case)

>>> df.head(3)
      animal
0  alligator
1        bee
2     falcon

For negative values of n

>>> df.head(-3)
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
hilbert_distance(total_bounds=None, level=16)

Calculate the distance along a Hilbert curve.

The distances are calculated for the midpoints of the geometries in the GeoDataFrame, and using the total bounds of the GeoDataFrame.

The Hilbert distance can be used to spatially sort GeoPandas objects, by mapping two dimensional geometries along the Hilbert curve.

Parameters:
  • total_bounds (4-element array, optional) – The spatial extent in which the curve is constructed (used to rescale the geometry midpoints). By default, the total bounds of the full GeoDataFrame or GeoSeries will be computed. If known, you can pass the total bounds to avoid this extra computation.

  • level (int (1 - 16), default 16) – Determines the precision of the curve (points on the curve will have coordinates in the range [0, 2^level - 1]).

Returns:

Series containing distance along the curve for geometry

Return type:

Series

hist(column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, backend=None, legend=False, **kwargs)

Make a histogram of the DataFrame’s columns.

A histogram is a representation of the distribution of data. This function calls matplotlib.pyplot.hist(), on each series in the DataFrame, resulting in one histogram per column.

Parameters:
  • data (DataFrame) – The pandas object holding the data.

  • column (str or sequence, optional) – If passed, will be used to limit data to a subset of columns.

  • by (object, optional) – If passed, then used to form histograms for separate groups.

  • grid (bool, default True) – Whether to show axis grid lines.

  • xlabelsize (int, default None) – If specified changes the x-axis label size.

  • xrot (float, default None) – Rotation of x axis labels. For example, a value of 90 displays the x labels rotated 90 degrees clockwise.

  • ylabelsize (int, default None) – If specified changes the y-axis label size.

  • yrot (float, default None) – Rotation of y axis labels. For example, a value of 90 displays the y labels rotated 90 degrees clockwise.

  • ax (Matplotlib axes object, default None) – The axes to plot the histogram on.

  • sharex (bool, default True if ax is None else False) – In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in. Note that passing in both an ax and sharex=True will alter all x axis labels for all subplots in a figure.

  • sharey (bool, default False) – In case subplots=True, share y axis and set some y axis labels to invisible.

  • figsize (tuple, optional) – The size in inches of the figure to create. Uses the value in matplotlib.rcParams by default.

  • layout (tuple, optional) – Tuple of (rows, columns) for the layout of the histograms.

  • bins (int or sequence, default 10) – Number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, gives bin edges, including left edge of first bin and right edge of last bin. In this case, bins is returned unmodified.

  • backend (str, default None) – Backend to use instead of the backend specified in the option plotting.backend. For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend.

  • legend (bool, default False) – Whether to show the legend.

  • **kwargs – All other plotting keyword arguments to be passed to matplotlib.pyplot.hist().

Return type:

matplotlib.AxesSubplot or numpy.ndarray of them

See also

matplotlib.pyplot.hist

Plot a histogram using matplotlib.

Examples

This example draws a histogram based on the length and width of some animals, displayed in three bins

property iat: _iAtIndexer

Access a single value for a row/column pair by integer position.

Similar to iloc, in that both provide integer-based lookups. Use iat if you only need to get or set a single value in a DataFrame or Series.

Raises:

IndexError – When integer position is out of bounds.

See also

DataFrame.at

Access a single value for a row/column label pair.

DataFrame.loc

Access a group of rows and columns by label(s).

DataFrame.iloc

Access a group of rows and columns by integer position(s).

Examples

>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                   columns=['A', 'B', 'C'])
>>> df
    A   B   C
0   0   2   3
1   0   4   1
2  10  20  30

Get value at specified row/column pair

>>> df.iat[1, 2]
1

Set value at specified row/column pair

>>> df.iat[1, 2] = 10
>>> df.iat[1, 2]
10

Get value within a series

>>> df.loc[0].iat[1]
2
idxmax(axis=0, skipna=True, numeric_only=False)

Return index of first occurrence of maximum over requested axis.

NA/null values are excluded.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • numeric_only (bool, default False) –

    Include only float, int or boolean data.

    Added in version 1.5.0.

Returns:

Indexes of maxima along the specified axis.

Return type:

Series

Raises:

ValueError

  • If the row/column is empty

See also

Series.idxmax

Return index of the maximum element.

Notes

This method is the DataFrame version of ndarray.argmax.

Examples

Consider a dataset containing food consumption in Argentina.

>>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
...                     'co2_emissions': [37.2, 19.66, 1712]},
...                   index=['Pork', 'Wheat Products', 'Beef'])
>>> df
                consumption  co2_emissions
Pork                  10.51         37.20
Wheat Products       103.11         19.66
Beef                  55.48       1712.00

By default, it returns the index for the maximum value in each column.

>>> df.idxmax()
consumption     Wheat Products
co2_emissions             Beef
dtype: object

To return the index for the maximum value in each row, use axis="columns".

>>> df.idxmax(axis="columns")
Pork              co2_emissions
Wheat Products     consumption
Beef              co2_emissions
dtype: object
idxmin(axis=0, skipna=True, numeric_only=False)

Return index of first occurrence of minimum over requested axis.

NA/null values are excluded.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • numeric_only (bool, default False) –

    Include only float, int or boolean data.

    Added in version 1.5.0.

Returns:

Indexes of minima along the specified axis.

Return type:

Series

Raises:

ValueError

  • If the row/column is empty

See also

Series.idxmin

Return index of the minimum element.

Notes

This method is the DataFrame version of ndarray.argmin.

Examples

Consider a dataset containing food consumption in Argentina.

>>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
...                     'co2_emissions': [37.2, 19.66, 1712]},
...                   index=['Pork', 'Wheat Products', 'Beef'])
>>> df
                consumption  co2_emissions
Pork                  10.51         37.20
Wheat Products       103.11         19.66
Beef                  55.48       1712.00

By default, it returns the index for the minimum value in each column.

>>> df.idxmin()
consumption                Pork
co2_emissions    Wheat Products
dtype: object

To return the index for the minimum value in each row, use axis="columns".

>>> df.idxmin(axis="columns")
Pork                consumption
Wheat Products    co2_emissions
Beef                consumption
dtype: object
property iloc: _iLocIndexer

Purely integer-location based indexing for selection by position.

Deprecated since version 2.2.0: Returning a tuple from a callable is deprecated.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:

  • An integer, e.g. 5.

  • A list or array of integers, e.g. [4, 3, 0].

  • A slice object with ints, e.g. 1:7.

  • A boolean array.

  • A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.

  • A tuple of row and column indexes. The tuple elements consist of one of the above inputs, e.g. (0, 1).

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).

See more at Selection by Position.

See also

DataFrame.iat

Fast integer location scalar accessor.

DataFrame.loc

Purely label-location based indexer for selection by label.

Series.iloc

Purely integer-location based indexing for selection by position.

Examples

>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
...           {'a': 100, 'b': 200, 'c': 300, 'd': 400},
...           {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000}]
>>> df = pd.DataFrame(mydict)
>>> df
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

Indexing just the rows

With a scalar integer.

>>> type(df.iloc[0])
<class 'pandas.core.series.Series'>
>>> df.iloc[0]
a    1
b    2
c    3
d    4
Name: 0, dtype: int64

With a list of integers.

>>> df.iloc[[0]]
   a  b  c  d
0  1  2  3  4
>>> type(df.iloc[[0]])
<class 'pandas.core.frame.DataFrame'>
>>> df.iloc[[0, 1]]
     a    b    c    d
0    1    2    3    4
1  100  200  300  400

With a slice object.

>>> df.iloc[:3]
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

With a boolean mask the same length as the index.

>>> df.iloc[[True, False, True]]
      a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

With a callable, useful in method chains. The x passed to the lambda is the DataFrame being sliced. This selects the rows whose index label even.

>>> df.iloc[lambda x: x.index % 2 == 0]
      a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

Indexing both axes

You can mix the indexer types for the index and columns. Use : to select the entire axis.

With scalar integers.

>>> df.iloc[0, 1]
2

With lists of integers.

>>> df.iloc[[0, 2], [1, 3]]
      b     d
0     2     4
2  2000  4000

With slice objects.

>>> df.iloc[1:3, 0:3]
      a     b     c
1   100   200   300
2  1000  2000  3000

With a boolean array whose length matches the columns.

>>> df.iloc[:, [True, False, True, False]]
      a     c
0     1     3
1   100   300
2  1000  3000

With a callable function that expects the Series or DataFrame.

>>> df.iloc[:, lambda df: [0, 2]]
      a     c
0     1     3
1   100   300
2  1000  3000
index

The index (row labels) of the DataFrame.

The index of a DataFrame is a series of labels that identify each row. The labels can be integers, strings, or any other hashable type. The index is used for label-based access and alignment, and can be accessed or modified using this attribute.

Returns:

The index labels of the DataFrame.

Return type:

pandas.Index

See also

DataFrame.columns

The column labels of the DataFrame.

DataFrame.to_numpy

Convert the DataFrame to a NumPy array.

Examples

>>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
...                    'Age': [25, 30, 35],
...                    'Location': ['Seattle', 'New York', 'Kona']},
...                   index=([10, 20, 30]))
>>> df.index
Index([10, 20, 30], dtype='int64')

In this example, we create a DataFrame with 3 rows and 3 columns, including Name, Age, and Location information. We set the index labels to be the integers 10, 20, and 30. We then access the index attribute of the DataFrame, which returns an Index object containing the index labels.

>>> df.index = [100, 200, 300]
>>> df
    Name  Age Location
100  Alice   25  Seattle
200    Bob   30 New York
300  Aritra  35    Kona

In this example, we modify the index labels of the DataFrame by assigning a new list of labels to the index attribute. The DataFrame is then updated with the new labels, and the output shows the modified DataFrame.

infer_objects(copy=None)

Attempt to infer better dtypes for object columns.

Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

Parameters:

copy (bool, default True) –

Whether to make a copy for non-object or non-inferable columns or Series.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Return type:

same type as input object

See also

to_datetime

Convert argument to datetime.

to_timedelta

Convert argument to timedelta.

to_numeric

Convert argument to numeric type.

convert_dtypes

Convert argument to best possible dtype.

Examples

>>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df
   A
1  1
2  2
3  3
>>> df.dtypes
A    object
dtype: object
>>> df.infer_objects().dtypes
A    int64
dtype: object
info(verbose=None, buf=None, max_cols=None, memory_usage=None, show_counts=None)

Print a concise summary of a DataFrame.

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.

Parameters:
  • verbose (bool, optional) – Whether to print the full summary. By default, the setting in pandas.options.display.max_info_columns is followed.

  • buf (writable buffer, defaults to sys.stdout) – Where to send the output. By default, the output is printed to sys.stdout. Pass a writable buffer if you need to further process the output.

  • max_cols (int, optional) – When to switch from the verbose to the truncated output. If the DataFrame has more than max_cols columns, the truncated output is used. By default, the setting in pandas.options.display.max_info_columns is used.

  • memory_usage (bool, str, optional) –

    Specifies whether total memory usage of the DataFrame elements (including the index) should be displayed. By default, this follows the pandas.options.display.memory_usage setting.

    True always show memory usage. False never shows memory usage. A value of ‘deep’ is equivalent to “True with deep introspection”. Memory usage is shown in human-readable units (base-2 representation). Without deep introspection a memory estimation is made based in column dtype and number of rows assuming values consume the same memory amount for corresponding dtypes. With deep memory introspection, a real memory usage calculation is performed at the cost of computational resources. See the Frequently Asked Questions for more details.

  • show_counts (bool, optional) – Whether to show the non-null counts. By default, this is shown only if the DataFrame is smaller than pandas.options.display.max_info_rows and pandas.options.display.max_info_columns. A value of True always shows the counts, and False never shows the counts.

Returns:

This method prints a summary of a DataFrame and returns None.

Return type:

None

See also

DataFrame.describe

Generate descriptive statistics of DataFrame columns.

DataFrame.memory_usage

Memory usage of DataFrame columns.

Examples

>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = pd.DataFrame({"int_col": int_values, "text_col": text_values,
...                   "float_col": float_values})
>>> df
    int_col text_col  float_col
0        1    alpha       0.00
1        2     beta       0.25
2        3    gamma       0.50
3        4    delta       0.75
4        5  epsilon       1.00

Prints information of all columns:

>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   int_col    5 non-null      int64
 1   text_col   5 non-null      object
 2   float_col  5 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes

Prints a summary of columns count and its dtypes but not per column information:

>>> df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes

Pipe output of DataFrame.info to buffer instead of sys.stdout, get buffer content and writes to a text file:

>>> import io
>>> buffer = io.StringIO()
>>> df.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w",
...           encoding="utf-8") as f:
...     f.write(s)
260

The memory_usage parameter allows deep introspection mode, specially useful for big DataFrames and fine-tune memory optimization:

>>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
>>> df = pd.DataFrame({
...     'column_1': np.random.choice(['a', 'b', 'c'], 10 ** 6),
...     'column_2': np.random.choice(['a', 'b', 'c'], 10 ** 6),
...     'column_3': np.random.choice(['a', 'b', 'c'], 10 ** 6)
... })
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
 #   Column    Non-Null Count    Dtype
---  ------    --------------    -----
 0   column_1  1000000 non-null  object
 1   column_2  1000000 non-null  object
 2   column_3  1000000 non-null  object
dtypes: object(3)
memory usage: 22.9+ MB
>>> df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
 #   Column    Non-Null Count    Dtype
---  ------    --------------    -----
 0   column_1  1000000 non-null  object
 1   column_2  1000000 non-null  object
 2   column_3  1000000 non-null  object
dtypes: object(3)
memory usage: 165.9 MB
insert(loc, column, value, allow_duplicates=<no_default>)

Insert column into DataFrame at specified location.

Raises a ValueError if column is already contained in the DataFrame, unless allow_duplicates is set to True.

Parameters:
  • loc (int) – Insertion index. Must verify 0 <= loc <= len(columns).

  • column (str, number, or hashable object) – Label of the inserted column.

  • value (Scalar, Series, or array-like) – Content of the inserted column.

  • allow_duplicates (bool, optional, default lib.no_default) – Allow duplicate column labels to be created.

Return type:

None

See also

Index.insert

Insert new item by index.

Examples

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
   col1  col2
0     1     3
1     2     4
>>> df.insert(1, "newcol", [99, 99])
>>> df
   col1  newcol  col2
0     1      99     3
1     2      99     4
>>> df.insert(0, "col1", [100, 100], allow_duplicates=True)
>>> df
   col1  col1  newcol  col2
0   100     1      99     3
1   100     2      99     4

Notice that pandas uses index alignment in case of value from type Series:

>>> df.insert(0, "col0", pd.Series([5, 6], index=[1, 2]))
>>> df
   col0  col1  col1  newcol  col2
0   NaN   100     1      99     3
1   5.0   100     2      99     4
property interiors

Return a Series of List representing the inner rings of each polygon in the GeoSeries.

Applies to GeoSeries containing only Polygons.

Returns:

inner_rings – Inner rings of each polygon in the GeoSeries.

Return type:

Series of List

Examples

>>> from shapely.geometry import Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Polygon(
...             [(0, 0), (0, 5), (5, 5), (5, 0)],
...             [[(1, 1), (2, 1), (1, 2)], [(1, 4), (2, 4), (2, 3)]],
...         ),
...         Polygon([(1, 0), (2, 1), (0, 0)]),
...     ]
... )
>>> s
0    POLYGON ((0 0, 0 5, 5 5, 5 0, 0 0), (1 1, 2 1,...
1                       POLYGON ((1 0, 2 1, 0 0, 1 0))
dtype: geometry
>>> s.interiors
0    [LINEARRING (1 1, 2 1, 1 2, 1 1), LINEARRING (...
1                                                   []
dtype: object

See also

GeoSeries.exterior

outer boundary

interpolate(distance, normalized=False)

Return a point at the specified distance along each geometry.

Parameters:
  • distance (float or Series of floats) – Distance(s) along the geometries at which a point should be returned. If np.array or pd.Series are used then it must have same length as the GeoSeries.

  • normalized (boolean) – If normalized is True, distance will be interpreted as a fraction of the geometric object’s length.

Examples

>>> from shapely.geometry import LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         LineString([(0, 0), (2, 0), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...     ],
... )
>>> s
0    LINESTRING (0 0, 2 0, 0 2)
1         LINESTRING (0 0, 2 2)
2         LINESTRING (2 0, 0 2)
dtype: geometry
>>> s.interpolate(1)
0                POINT (1 0)
1    POINT (0.70711 0.70711)
2    POINT (1.29289 0.70711)
dtype: geometry
>>> s.interpolate([1, 2, 3])
0                POINT (1 0)
1    POINT (1.41421 1.41421)
2                POINT (0 2)
dtype: geometry
intersection(other, align=None)

Return a GeoSeries of the intersection of points in each aligned geometry with other.

../../../_static/binary_geo-intersection.svg

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the intersection with.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 6),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

We can also do intersection of each geometry and a single shapely geometry:

../../../_static/binary_op-03.svg
>>> s.intersection(Polygon([(0, 0), (1, 1), (0, 1)]))
0    POLYGON ((0 0, 0 1, 1 1, 0 0))
1    POLYGON ((0 0, 0 1, 1 1, 0 0))
2             LINESTRING (0 0, 1 1)
3                       POINT (1 1)
4                       POINT (0 1)
dtype: geometry

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.intersection(s2, align=True)
0                              None
1    POLYGON ((0 0, 0 1, 1 1, 0 0))
2                       POINT (1 1)
3             LINESTRING (2 0, 0 2)
4                       POINT EMPTY
5                              None
dtype: geometry
>>> s.intersection(s2, align=False)
0    POLYGON ((0 0, 0 1, 1 1, 0 0))
1             LINESTRING (1 1, 1 2)
2                       POINT (1 1)
3                       POINT (1 1)
4                       POINT (0 1)
dtype: geometry

See also

GeoSeries.difference, GeoSeries.symmetric_difference, GeoSeries.union

intersection_all()

Return a geometry containing the intersection of all geometries in the GeoSeries.

This method ignores None values when other geometries are present. If all elements of the GeoSeries are None, an empty GeometryCollection is returned.

Examples

>>> from shapely.geometry import box
>>> s = geopandas.GeoSeries(
...     [box(0, 0, 2, 2), box(1, 1, 3, 3), box(0, 0, 1.5, 1.5)]
... )
>>> s
0              POLYGON ((2 0, 2 2, 0 2, 0 0, 2 0))
1              POLYGON ((3 1, 3 3, 1 3, 1 1, 3 1))
2    POLYGON ((1.5 0, 1.5 1.5, 0 1.5, 0 0, 1.5 0))
dtype: geometry
>>> s.intersection_all()
<POLYGON ((1 1, 1 1.5, 1.5 1.5, 1.5 1, 1 1))>
intersects(other, align=None)

Return a Series of dtype('bool') with value True for each aligned geometry that intersects other.

An object is said to intersect other if its boundary and interior intersects in any way with those of the other.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if is intersected.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1             LINESTRING (0 0, 2 2)
2             LINESTRING (2 0, 0 2)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    LINESTRING (1 0, 1 3)
2    LINESTRING (2 0, 0 2)
3              POINT (1 1)
4              POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries crosses a single geometry:

../../../_static/binary_op-03.svg
>>> line = LineString([(-1, 1), (3, 1)])
>>> s.intersects(line)
0    True
1    True
2    True
3    True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.intersects(s2, align=True)
0    False
1     True
2     True
3    False
4    False
dtype: bool
>>> s.intersects(s2, align=False)
0    True
1    True
2    True
3    True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries crosses any element of the other one.

See also

GeoSeries.disjoint, GeoSeries.crosses, GeoSeries.touches, GeoSeries.intersection

invalid_coverage_edges(*, gap_width=0.0)

Return a GeoSeries containing edges causing invalid polygonal coverage.

This method returns (Multi)LineStrings showing the location of edges violating polygonal coverage (if any) in each polygon in the input GeoSeries.

A GeoSeries of valid polygons is considered a coverage if the polygons are:

  • Non-overlapping - polygons do not overlap (their interiors do not intersect)

  • Edge-Matched - vertices along shared edges are identical

A valid coverage may contain holes (regions of no coverage). However, sometimes it might be desirable to detect narrow gaps as invalidities in the coverage. The gap_width parameter allows to specify the maximum width of gaps to detect. When gaps are detected, the is_valid_coverage() method will return False and this method can be used to find the edges of those gaps.

Geometries that are not Polygon or MultiPolygon are ignored.

Requires Shapely >= 2.1.

Added in version 1.1.0.

Parameters:

gap_width (float, optional) – The maximum width of gaps to detect, by default 0.0

Return type:

GeoSeries

Examples

Violation of edge-matching:

>>> from shapely import Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (1, 0), (0, 0)]),
...         Polygon([(0, 0), (0.5, 0.5), (1, 1), (0, 1), (0, 0)])
...     ]
... )
>>> s
0             POLYGON ((0 0, 1 1, 1 0, 0 0))
1    POLYGON ((0 0, 0.5 0.5, 1 1, 0 1, 0 0))
dtype: geometry
>>> s.invalid_coverage_edges()
0             LINESTRING (0 0, 1 1)
1    LINESTRING (0 0, 0.5 0.5, 1 1)
dtype: geometry

See also

GeoSeries.is_valid_coverage, GeoSeries.simplify_coverage

property is_ccw

Return a Series of dtype('bool') with value True if a LineString or LinearRing is counterclockwise.

Note that there are no checks on whether lines are actually closed and not self-intersecting, while this is a requirement for is_ccw. The recommended usage of this property for LineStrings is GeoSeries.is_ccw & GeoSeries.is_simple and for LinearRings GeoSeries.is_ccw & GeoSeries.is_valid.

This property will return False for non-linear geometries and for lines with fewer than 4 points (including the closing point).

Examples

>>> from shapely.geometry import LineString, LinearRing, Point
>>> s = geopandas.GeoSeries(
...     [
...         LinearRing([(0, 0), (0, 1), (1, 1), (0, 0)]),
...         LinearRing([(0, 0), (1, 1), (0, 1), (0, 0)]),
...         LineString([(0, 0), (1, 1), (0, 1)]),
...         Point(3, 3)
...     ]
... )
>>> s
0    LINEARRING (0 0, 0 1, 1 1, 0 0)
1    LINEARRING (0 0, 1 1, 0 1, 0 0)
2         LINESTRING (0 0, 1 1, 0 1)
3                        POINT (3 3)
dtype: geometry
>>> s.is_ccw
0    False
1     True
2    False
3    False
dtype: bool
property is_closed

Return a Series of dtype('bool') with value True if a LineString’s or LinearRing’s first and last points are equal.

Returns False for any other geometry type.

Examples

>>> from shapely.geometry import LineString, Point, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         LineString([(0, 0), (1, 1), (0, 1), (0, 0)]),
...         LineString([(0, 0), (1, 1), (0, 1)]),
...         Polygon([(0, 0), (0, 1), (1, 1), (0, 0)]),
...         Point(3, 3)
...     ]
... )
>>> s
0    LINESTRING (0 0, 1 1, 0 1, 0 0)
1         LINESTRING (0 0, 1 1, 0 1)
2     POLYGON ((0 0, 0 1, 1 1, 0 0))
3                        POINT (3 3)
dtype: geometry
>>> s.is_closed
0     True
1    False
2    False
3    False
dtype: bool
property is_empty

Returns a Series of dtype('bool') with value True for empty geometries.

Examples

An example of a GeoDataFrame with one empty point, one point and one missing value:

>>> from shapely.geometry import Point
>>> d = {'geometry': [Point(), Point(2, 1), None]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf
    geometry
0  POINT EMPTY
1  POINT (2 1)
2         None
>>> gdf.is_empty
0     True
1    False
2    False
dtype: bool

See also

GeoSeries.isna

detect missing values

property is_ring

Return a Series of dtype('bool') with value True for features that are closed.

When constructing a LinearRing, the sequence of coordinates may be explicitly closed by passing identical values in the first and last indices. Otherwise, the sequence will be implicitly closed by copying the first tuple to the last index.

Examples

>>> from shapely.geometry import LineString, LinearRing
>>> s = geopandas.GeoSeries(
...     [
...         LineString([(0, 0), (1, 1), (1, -1)]),
...         LineString([(0, 0), (1, 1), (1, -1), (0, 0)]),
...         LinearRing([(0, 0), (1, 1), (1, -1)]),
...     ]
... )
>>> s
0         LINESTRING (0 0, 1 1, 1 -1)
1    LINESTRING (0 0, 1 1, 1 -1, 0 0)
2    LINEARRING (0 0, 1 1, 1 -1, 0 0)
dtype: geometry
>>> s.is_ring
0    False
1     True
2     True
dtype: bool
property is_simple

Return a Series of dtype('bool') with value True for geometries that do not cross themselves.

This is meaningful only for LineStrings and LinearRings.

Examples

>>> from shapely.geometry import LineString
>>> s = geopandas.GeoSeries(
...     [
...         LineString([(0, 0), (1, 1), (1, -1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, -1)]),
...     ]
... )
>>> s
0    LINESTRING (0 0, 1 1, 1 -1, 0 1)
1         LINESTRING (0 0, 1 1, 1 -1)
dtype: geometry
>>> s.is_simple
0    False
1     True
dtype: bool
property is_valid

Return a Series of dtype('bool') with value True for geometries that are valid.

Examples

An example with one invalid polygon (a bowtie geometry crossing itself) and one missing geometry:

>>> from shapely.geometry import Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         Polygon([(0,0), (1, 1), (1, 0), (0, 1)]),  # bowtie geometry
...         Polygon([(0, 0), (2, 2), (2, 0)]),
...         None
...     ]
... )
>>> s
0         POLYGON ((0 0, 1 1, 0 1, 0 0))
1    POLYGON ((0 0, 1 1, 1 0, 0 1, 0 0))
2         POLYGON ((0 0, 2 2, 2 0, 0 0))
3                                   None
dtype: geometry
>>> s.is_valid
0     True
1    False
2     True
3    False
dtype: bool

See also

GeoSeries.is_valid_reason

reason for invalidity

is_valid_coverage(*, gap_width=0.0)

Return a bool indicating whether a GeoSeries forms a valid coverage.

A GeoSeries of valid polygons is considered a coverage if the polygons are:

  • Non-overlapping - polygons do not overlap (their interiors do not intersect)

  • Edge-Matched - vertices along shared edges are identical

A valid coverage may contain holes (regions of no coverage). However, sometimes it might be desirable to detect narrow gaps as invalidities in the coverage. The gap_width parameter allows to specify the maximum width of gaps to detect. When gaps are detected, this method will return False and the coverage_invalid_edges() method can be used to find the edges of those gaps.

Geometries that are not Polygon or MultiPolygon are ignored and an empty LineString is returned.

Requires Shapely >= 2.1.

Added in version 1.1.0.

Parameters:

gap_width (float, optional) – The maximum width of gaps to detect, by default 0.0

Return type:

bool

Examples

>>> from shapely import Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (1, 0), (0, 0)]),
...         Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 1 0, 0 0))
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
dtype: geometry
>>> s.is_valid_coverage()
True

Violation of edge-matching:

>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (1, 0), (0, 0)]),
...         Polygon([(0, 0), (0.5, 0.5), (1, 1), (0, 1), (0, 0)])
...     ]
... )
>>> s2
0             POLYGON ((0 0, 1 1, 1 0, 0 0))
1    POLYGON ((0 0, 0.5 0.5, 1 1, 0 1, 0 0))
dtype: geometry
>>> s2.is_valid_coverage()
False

See also

GeoSeries.invalid_coverage_edges, GeoSeries.simplify_coverage

is_valid_reason()

Return a Series of strings with the reason for invalidity of each geometry.

Examples

An example with one invalid polygon (a bowtie geometry crossing itself) and one missing geometry:

>>> from shapely.geometry import Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         Polygon([(0,0), (1, 1), (1, 0), (0, 1)]),  # bowtie geometry
...         Polygon([(0, 0), (2, 2), (2, 0)]),
...         None
...     ]
... )
>>> s
0         POLYGON ((0 0, 1 1, 0 1, 0 0))
1    POLYGON ((0 0, 1 1, 1 0, 0 1, 0 0))
2         POLYGON ((0 0, 2 2, 2 0, 0 0))
3                                   None
dtype: geometry
>>> s.is_valid_reason()
0    Valid Geometry
1    Self-intersection[0.5 0.5]
2    Valid Geometry
3    None
dtype: object

See also

GeoSeries.is_valid

detect invalid geometries

GeoSeries.make_valid

fix invalid geometries

isetitem(loc, value)

Set the given value in the column with position loc.

This is a positional analogue to __setitem__.

Parameters:
  • loc (int or sequence of ints) – Index position for the column.

  • value (scalar or arraylike) – Value(s) for the column.

Return type:

None

Notes

frame.isetitem(loc, value) is an in-place method as it will modify the DataFrame in place (not returning a new object). In contrast to frame.iloc[:, i] = value which will try to update the existing values in place, frame.isetitem(loc, value) will not update the values of the column itself in place, it will instead insert a new array.

In cases where frame.columns is unique, this is equivalent to frame[frame.columns[i]] = value.

isin(values)

Whether each element in the DataFrame is contained in values.

Parameters:

values (iterable, Series, DataFrame or dict) – The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dict, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

Returns:

DataFrame of booleans showing whether each element in the DataFrame is contained in values.

Return type:

DataFrame

See also

DataFrame.eq

Equality test for DataFrame.

Series.isin

Equivalent method on Series.

Series.str.contains

Test if pattern or regex is contained within a string of a Series or Index.

Examples

>>> df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
...                   index=['falcon', 'dog'])
>>> df
        num_legs  num_wings
falcon         2          2
dog            4          0

When values is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings)

>>> df.isin([0, 2])
        num_legs  num_wings
falcon      True       True
dog        False       True

To check if values is not in the DataFrame, use the ~ operator:

>>> ~df.isin([0, 2])
        num_legs  num_wings
falcon     False      False
dog         True      False

When values is a dict, we can pass values to check for each column separately:

>>> df.isin({'num_wings': [0, 3]})
        num_legs  num_wings
falcon     False      False
dog        False       True

When values is a Series or DataFrame the index and column must match. Note that ‘falcon’ does not match based on the number of legs in other.

>>> other = pd.DataFrame({'num_legs': [8, 3], 'num_wings': [0, 2]},
...                      index=['spider', 'falcon'])
>>> df.isin(other)
        num_legs  num_wings
falcon     False       True
dog        False      False
isna()

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns:

Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

Return type:

DataFrame

See also

DataFrame.isnull

Alias of isna.

DataFrame.notna

Boolean inverse of isna.

DataFrame.dropna

Omit axes labels with missing values.

isna

Top-level isna.

Examples

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
dtype: bool
isnull()

DataFrame.isnull is an alias for DataFrame.isna.

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns:

Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

Return type:

DataFrame

See also

DataFrame.isnull

Alias of isna.

DataFrame.notna

Boolean inverse of isna.

DataFrame.dropna

Omit axes labels with missing values.

isna

Top-level isna.

Examples

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
dtype: bool
items()

Iterate over (column name, Series) pairs.

Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.

Yields:
  • label (object) – The column names for the DataFrame being iterated over.

  • content (Series) – The column entries belonging to each label, as a Series.

Return type:

Iterable[tuple[Hashable, Series]]

See also

DataFrame.iterrows

Iterate over DataFrame rows as (index, Series) pairs.

DataFrame.itertuples

Iterate over DataFrame rows as namedtuples of the values.

Examples

>>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
...                   'population': [1864, 22000, 80000]},
...                   index=['panda', 'polar', 'koala'])
>>> df
        species   population
panda   bear      1864
polar   bear      22000
koala   marsupial 80000
>>> for label, content in df.items():
...     print(f'label: {label}')
...     print(f'content: {content}', sep='\n')
...
label: species
content:
panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
label: population
content:
panda     1864
polar    22000
koala    80000
Name: population, dtype: int64
Return type:

Iterable[tuple[Hashable, Series]]

iterfeatures(na='null', show_bbox=False, drop_id=False)[source]

Return an iterator that yields feature dictionaries that comply with __geo_interface__.

Parameters:
  • na (str, optional) –

    Options are {‘null’, ‘drop’, ‘keep’}, default ‘null’. Indicates how to output missing (NaN) values in the GeoDataFrame

    • null: output the missing entries as JSON null

    • drop: remove the property from the feature. This applies to each feature individually so that features may have different properties

    • keep: output the missing entries as NaN

  • show_bbox (bool, optional) – Include bbox (bounds) in the geojson. Default False.

  • drop_id (bool, default: False) – Whether to retain the index of the GeoDataFrame as the id property in the generated GeoJSON. Default is False, but may want True if the index is just arbitrary row numbers.

Examples

>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)
>>> feature = next(gdf.iterfeatures())
>>> feature
{'id': '0', 'type': 'Feature', 'properties': {'col1': 'name1'}, 'geometry': {'type': 'Point', 'coordinates': (1.0, 2.0)}}
iterrows()

Iterate over DataFrame rows as (index, Series) pairs.

Yields:
  • index (label or tuple of label) – The index of the row. A tuple for a MultiIndex.

  • data (Series) – The data of the row as a Series.

Return type:

Iterable[tuple[Hashable, Series]]

See also

DataFrame.itertuples

Iterate over DataFrame rows as namedtuples of the values.

DataFrame.items

Iterate over (column name, Series) pairs.

Notes

  1. Because iterrows returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames).

    To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally faster than iterrows.

  2. You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

Examples

>>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])
>>> row = next(df.iterrows())[1]
>>> row
int      1.0
float    1.5
Name: 0, dtype: float64
>>> print(row['int'].dtype)
float64
>>> print(df['int'].dtype)
int64
Return type:

Iterable[tuple[Hashable, Series]]

itertuples(index=True, name='Pandas')

Iterate over DataFrame rows as namedtuples.

Parameters:
  • index (bool, default True) – If True, return the index as the first element of the tuple.

  • name (str or None, default "Pandas") – The name of the returned namedtuples or None to return regular tuples.

Returns:

An object to iterate over namedtuples for each row in the DataFrame with the first field possibly being the index and following fields being the column values.

Return type:

iterator

See also

DataFrame.iterrows

Iterate over DataFrame rows as (index, Series) pairs.

DataFrame.items

Iterate over (column name, Series) pairs.

Notes

The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore.

Examples

>>> df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
...                   index=['dog', 'hawk'])
>>> df
      num_legs  num_wings
dog          4          0
hawk         2          2
>>> for row in df.itertuples():
...     print(row)
...
Pandas(Index='dog', num_legs=4, num_wings=0)
Pandas(Index='hawk', num_legs=2, num_wings=2)

By setting the index parameter to False we can remove the index as the first element of the tuple:

>>> for row in df.itertuples(index=False):
...     print(row)
...
Pandas(num_legs=4, num_wings=0)
Pandas(num_legs=2, num_wings=2)

With the name parameter set we set a custom name for the yielded namedtuples:

>>> for row in df.itertuples(name='Animal'):
...     print(row)
...
Animal(Index='dog', num_legs=4, num_wings=0)
Animal(Index='hawk', num_legs=2, num_wings=2)
join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, validate=None)

Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

Parameters:
  • other (DataFrame, Series, or a list containing any combination of them) – Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame.

  • on (str, list of str, or array-like, optional) – Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index. If multiple values given, the other DataFrame must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataFrame. Like an Excel VLOOKUP operation.

  • how ({'left', 'right', 'outer', 'inner', 'cross'}, default 'left') –

    How to handle the operation of the two objects.

    • left: use calling frame’s index (or column if on is specified)

    • right: use other’s index.

    • outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it lexicographically.

    • inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.

    • cross: creates the cartesian product from both frames, preserves the order of the left keys.

  • lsuffix (str, default '') – Suffix to use from left frame’s overlapping columns.

  • rsuffix (str, default '') – Suffix to use from right frame’s overlapping columns.

  • sort (bool, default False) – Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword).

  • validate (str, optional) –

    If specified, checks if join is of specified type.

    • ”one_to_one” or “1:1”: check if join keys are unique in both left and right datasets.

    • ”one_to_many” or “1:m”: check if join keys are unique in left dataset.

    • ”many_to_one” or “m:1”: check if join keys are unique in right dataset.

    • ”many_to_many” or “m:m”: allowed, but does not result in checks.

    Added in version 1.5.0.

Returns:

A dataframe containing columns from both the caller and other.

Return type:

DataFrame

See also

DataFrame.merge

For column(s)-on-column(s) operations.

Notes

Parameters on, lsuffix, and rsuffix are not supported when passing a list of DataFrame objects.

Examples

>>> df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
...                    'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
>>> df
  key   A
0  K0  A0
1  K1  A1
2  K2  A2
3  K3  A3
4  K4  A4
5  K5  A5
>>> other = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
...                       'B': ['B0', 'B1', 'B2']})
>>> other
  key   B
0  K0  B0
1  K1  B1
2  K2  B2

Join DataFrames using their indexes.

>>> df.join(other, lsuffix='_caller', rsuffix='_other')
  key_caller   A key_other    B
0         K0  A0        K0   B0
1         K1  A1        K1   B1
2         K2  A2        K2   B2
3         K3  A3       NaN  NaN
4         K4  A4       NaN  NaN
5         K5  A5       NaN  NaN

If we want to join using the key columns, we need to set key to be the index in both df and other. The joined DataFrame will have key as its index.

>>> df.set_index('key').join(other.set_index('key'))
      A    B
key
K0   A0   B0
K1   A1   B1
K2   A2   B2
K3   A3  NaN
K4   A4  NaN
K5   A5  NaN

Another option to join using the key columns is to use the on parameter. DataFrame.join always uses other’s index but we can use any column in df. This method preserves the original DataFrame’s index in the result.

>>> df.join(other.set_index('key'), on='key')
  key   A    B
0  K0  A0   B0
1  K1  A1   B1
2  K2  A2   B2
3  K3  A3  NaN
4  K4  A4  NaN
5  K5  A5  NaN

Using non-unique key values shows how they are matched.

>>> df = pd.DataFrame({'key': ['K0', 'K1', 'K1', 'K3', 'K0', 'K1'],
...                    'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
>>> df
  key   A
0  K0  A0
1  K1  A1
2  K1  A2
3  K3  A3
4  K0  A4
5  K1  A5
>>> df.join(other.set_index('key'), on='key', validate='m:1')
  key   A    B
0  K0  A0   B0
1  K1  A1   B1
2  K1  A2   B1
3  K3  A3  NaN
4  K0  A4   B0
5  K1  A5   B1
keys()

Get the ‘info axis’ (see Indexing for more).

This is index for Series, columns for DataFrame.

Returns:

Info axis.

Return type:

Index

Examples

>>> d = pd.DataFrame(data={'A': [1, 2, 3], 'B': [0, 4, 8]},
...                  index=['a', 'b', 'c'])
>>> d
   A  B
a  1  0
b  2  4
c  3  8
>>> d.keys()
Index(['A', 'B'], dtype='object')
kurt(axis=0, skipna=True, numeric_only=False, **kwargs)

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:
  • axis ({index (0), columns (1)}) –

    Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

    For DataFrames, specifying axis=None will apply the aggregation across both axes.

    Added in version 2.0.0.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64

Return type:

Series or scalar

kurtosis(axis=0, skipna=True, numeric_only=False, **kwargs)

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:
  • axis ({index (0), columns (1)}) –

    Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

    For DataFrames, specifying axis=None will apply the aggregation across both axes.

    Added in version 2.0.0.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64

Return type:

Series or scalar

last(offset)

Select final periods of time series data based on a date offset.

Deprecated since version 2.1: last() is deprecated and will be removed in a future version. Please create a mask and filter using .loc instead.

For a DataFrame with a sorted DatetimeIndex, this function selects the last few rows based on a date offset.

Parameters:

offset (str, DateOffset, dateutil.relativedelta) – The offset length of the data that will be selected. For instance, ‘3D’ will display all the rows having their index within the last 3 days.

Returns:

A subset of the caller.

Return type:

Series or DataFrame

Raises:

TypeError – If the index is not a DatetimeIndex

See also

first

Select initial periods of time series based on a date offset.

at_time

Select values at a particular time of the day.

between_time

Select values between particular times of the day.

Notes

Deprecated since version 2.1.0: Please create a mask and filter using .loc instead

Examples

>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
            A
2018-04-09  1
2018-04-11  2
2018-04-13  3
2018-04-15  4

Get the rows for the last 3 days:

>>> ts.last('3D')
            A
2018-04-13  3
2018-04-15  4

Notice the data for 3 last calendar days were returned, not the last 3 observed days in the dataset, and therefore data for 2018-04-11 was not returned.

last_valid_index()

Return index for last non-NA value or None, if no non-NA value is found.

Return type:

type of index

Examples

For Series:

>>> s = pd.Series([None, 3, 4])
>>> s.first_valid_index()
1
>>> s.last_valid_index()
2
>>> s = pd.Series([None, None])
>>> print(s.first_valid_index())
None
>>> print(s.last_valid_index())
None

If all elements in Series are NA/null, returns None.

>>> s = pd.Series()
>>> print(s.first_valid_index())
None
>>> print(s.last_valid_index())
None

If Series is empty, returns None.

For DataFrame:

>>> df = pd.DataFrame({'A': [None, None, 2], 'B': [None, 3, 4]})
>>> df
     A      B
0  NaN    NaN
1  NaN    3.0
2  2.0    4.0
>>> df.first_valid_index()
1
>>> df.last_valid_index()
2
>>> df = pd.DataFrame({'A': [None, None, None], 'B': [None, None, None]})
>>> df
     A      B
0  None   None
1  None   None
2  None   None
>>> print(df.first_valid_index())
None
>>> print(df.last_valid_index())
None

If all elements in DataFrame are NA/null, returns None.

>>> df = pd.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> print(df.first_valid_index())
None
>>> print(df.last_valid_index())
None

If DataFrame is empty, returns None.

le(other, axis='columns', level=None)

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
property length

Return a Series containing the length of each geometry expressed in the units of the CRS.

In the case of a (Multi)Polygon it measures the length of its exterior (i.e. perimeter).

Examples

>>> from shapely.geometry import Polygon, LineString, MultiLineString, Point, GeometryCollection
>>> s = geopandas.GeoSeries(
...     [
...         LineString([(0, 0), (1, 1), (0, 1)]),
...         LineString([(10, 0), (10, 5), (0, 0)]),
...         MultiLineString([((0, 0), (1, 0)), ((-1, 0), (1, 0))]),
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         Point(0, 1),
...         GeometryCollection([Point(1, 0), LineString([(10, 0), (10, 5), (0, 0)])])
...     ]
... )
>>> s
0                           LINESTRING (0 0, 1 1, 0 1)
1                         LINESTRING (10 0, 10 5, 0 0)
2            MULTILINESTRING ((0 0, 1 0), (-1 0, 1 0))
3                       POLYGON ((0 0, 1 1, 0 1, 0 0))
4                                          POINT (0 1)
5    GEOMETRYCOLLECTION (POINT (1 0), LINESTRING (1...
dtype: geometry
>>> s.length
0     2.414214
1    16.180340
2     3.000000
3     3.414214
4     0.000000
5    16.180340
dtype: float64

See also

GeoSeries.area

measure area of a polygon

Notes

Length may be invalid for a geographic CRS using degrees as units; use GeoSeries.to_crs() to project geometries to a planar CRS before using this function.

Every operation in GeoPandas is planar, i.e. the potential third dimension is not taken into account.

line_merge(directed=False)

Return (Multi)LineStrings formed by combining the lines in a MultiLineString.

Lines are joined together at their endpoints in case two lines are intersecting. Lines are not joined when 3 or more lines are intersecting at the endpoints. Line elements that cannot be joined are kept as is in the resulting MultiLineString.

The direction of each merged LineString will be that of the majority of the LineStrings from which it was derived. Except if directed=True is specified, then the operation will not change the order of points within lines and so only lines which can be joined with no change in direction are merged.

Non-linear geometeries result in an empty GeometryCollection.

Parameters:

directed (bool, default False) – Only combine lines if possible without changing point order. Requires GEOS >= 3.11.0

Return type:

GeoSeries

Examples

>>> from shapely.geometry import MultiLineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         MultiLineString([[(0, 2), (0, 10)], [(0, 10), (5, 10)]]),
...         MultiLineString([[(0, 2), (0, 10)], [(0, 11), (5, 10)]]),
...         MultiLineString(),
...         MultiLineString([[(0, 0), (1, 0)], [(0, 0), (3, 0)]]),
...         Point(0, 0),
...     ]
... )
>>> s
0    MULTILINESTRING ((0 2, 0 10), (0 10, 5 10))
1    MULTILINESTRING ((0 2, 0 10), (0 11, 5 10))
2                          MULTILINESTRING EMPTY
3       MULTILINESTRING ((0 0, 1 0), (0 0, 3 0))
4                                    POINT (0 0)
dtype: geometry
>>> s.line_merge()
0                   LINESTRING (0 2, 0 10, 5 10)
1    MULTILINESTRING ((0 2, 0 10), (0 11, 5 10))
2                       GEOMETRYCOLLECTION EMPTY
3                     LINESTRING (1 0, 0 0, 3 0)
4                       GEOMETRYCOLLECTION EMPTY
dtype: geometry

With directed=True, you can avoid changing the order of points within lines and merge only lines where no change of direction is required:

>>> s.line_merge(directed=True)
0                   LINESTRING (0 2, 0 10, 5 10)
1    MULTILINESTRING ((0 2, 0 10), (0 11, 5 10))
2                       GEOMETRYCOLLECTION EMPTY
3       MULTILINESTRING ((0 0, 1 0), (0 0, 3 0))
4                       GEOMETRYCOLLECTION EMPTY
dtype: geometry
property loc: _LocIndexer

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

  • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).

  • A list or array of labels, e.g. ['a', 'b', 'c'].

  • A slice object with labels, e.g. 'a':'f'.

    Warning

    Note that contrary to usual python slices, both the start and the stop are included

  • A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

  • An alignable boolean Series. The index of the key will be aligned before masking.

  • An alignable Index. The Index of the returned selection will be the input.

  • A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)

See more at Selection by Label.

Raises:
  • KeyError – If any items are not found.

  • IndexingError – If an indexed key is passed and its index is unalignable to the frame index.

See also

DataFrame.at

Access a single value for a row/column label pair.

DataFrame.iloc

Access group of rows and columns by integer position(s).

DataFrame.xs

Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.

Series.loc

Access group of values using labels.

Examples

Getting values

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...                   index=['cobra', 'viper', 'sidewinder'],
...                   columns=['max_speed', 'shield'])
>>> df
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

Single label. Note this returns the row as a Series.

>>> df.loc['viper']
max_speed    4
shield       5
Name: viper, dtype: int64

List of labels. Note using [[]] returns a DataFrame.

>>> df.loc[['viper', 'sidewinder']]
            max_speed  shield
viper               4       5
sidewinder          7       8

Single label for row and column

>>> df.loc['cobra', 'shield']
2

Slice with labels for row and single label for column. As mentioned above, note that both the start and stop of the slice are included.

>>> df.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64

Boolean list with the same length as the row axis

>>> df.loc[[False, False, True]]
            max_speed  shield
sidewinder          7       8

Alignable boolean Series:

>>> df.loc[pd.Series([False, True, False],
...                  index=['viper', 'sidewinder', 'cobra'])]
                     max_speed  shield
sidewinder          7       8

Index (same behavior as df.reindex)

>>> df.loc[pd.Index(["cobra", "viper"], name="foo")]
       max_speed  shield
foo
cobra          1       2
viper          4       5

Conditional that returns a boolean Series

>>> df.loc[df['shield'] > 6]
            max_speed  shield
sidewinder          7       8

Conditional that returns a boolean Series with column labels specified

>>> df.loc[df['shield'] > 6, ['max_speed']]
            max_speed
sidewinder          7

Multiple conditional using & that returns a boolean Series

>>> df.loc[(df['max_speed'] > 1) & (df['shield'] < 8)]
            max_speed  shield
viper          4       5

Multiple conditional using | that returns a boolean Series

>>> df.loc[(df['max_speed'] > 4) | (df['shield'] < 5)]
            max_speed  shield
cobra               1       2
sidewinder          7       8

Please ensure that each condition is wrapped in parentheses (). See the user guide for more details and explanations of Boolean indexing.

Note

If you find yourself using 3 or more conditionals in .loc[], consider using advanced indexing.

See below for using .loc[] on MultiIndex DataFrames.

Callable that returns a boolean Series

>>> df.loc[lambda df: df['shield'] == 8]
            max_speed  shield
sidewinder          7       8

Setting values

Set value for all items matching the list of labels

>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
            max_speed  shield
cobra               1       2
viper               4      50
sidewinder          7      50

Set value for an entire row

>>> df.loc['cobra'] = 10
>>> df
            max_speed  shield
cobra              10      10
viper               4      50
sidewinder          7      50

Set value for an entire column

>>> df.loc[:, 'max_speed'] = 30
>>> df
            max_speed  shield
cobra              30      10
viper              30      50
sidewinder         30      50

Set value for rows matching callable condition

>>> df.loc[df['shield'] > 35] = 0
>>> df
            max_speed  shield
cobra              30      10
viper               0       0
sidewinder          0       0

Add value matching location

>>> df.loc["viper", "shield"] += 5
>>> df
            max_speed  shield
cobra              30      10
viper               0       5
sidewinder          0       0

Setting using a Series or a DataFrame sets the values matching the index labels, not the index positions.

>>> shuffled_df = df.loc[["viper", "cobra", "sidewinder"]]
>>> df.loc[:] += shuffled_df
>>> df
            max_speed  shield
cobra              60      20
viper               0      10
sidewinder          0       0

Getting values on a DataFrame with an index that has integer labels

Another example using integers for the index

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...                   index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df
   max_speed  shield
7          1       2
8          4       5
9          7       8

Slice with integer labels for rows. As mentioned above, note that both the start and stop of the slice are included.

>>> df.loc[7:9]
   max_speed  shield
7          1       2
8          4       5
9          7       8

Getting values with a MultiIndex

A number of examples using a DataFrame with a MultiIndex

>>> tuples = [
...     ('cobra', 'mark i'), ('cobra', 'mark ii'),
...     ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
...     ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
...           [1, 4], [7, 1], [16, 36]]
>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Single label. Note this returns a DataFrame with a single index.

>>> df.loc['cobra']
         max_speed  shield
mark i          12       2
mark ii          0       4

Single index tuple. Note this returns a Series.

>>> df.loc[('cobra', 'mark ii')]
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

Single label for row and column. Similar to passing in a tuple, this returns a Series.

>>> df.loc['cobra', 'mark i']
max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64

Single tuple. Note using [[]] returns a DataFrame.

>>> df.loc[[('cobra', 'mark ii')]]
               max_speed  shield
cobra mark ii          0       4

Single tuple for the index with a single label for the column

>>> df.loc[('cobra', 'mark i'), 'shield']
2

Slice from index tuple to single label

>>> df.loc[('cobra', 'mark i'):'viper']
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Slice from index tuple to index tuple

>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
                    max_speed  shield
cobra      mark i          12       2
           mark ii          0       4
sidewinder mark i          10      20
           mark ii          1       4
viper      mark ii          7       1

Please see the user guide for more details and explanations of advanced indexing.

lt(other, axis='columns', level=None)

Get Less than of dataframe and other, element-wise (binary operator lt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
make_valid(*, method='linework', keep_collapsed=True)

Repairs invalid geometries.

Returns a GeoSeries with valid geometries.

If the input geometry is already valid, then it will be preserved. In many cases, in order to create a valid geometry, the input geometry must be split into multiple parts or multiple geometries. If the geometry must be split into multiple parts of the same type to be made valid, then a multi-part geometry will be returned (e.g. a MultiPolygon). If the geometry must be split into multiple parts of different types to be made valid, then a GeometryCollection will be returned.

Two methods are available:

  • the ‘linework’ algorithm tries to preserve every edge and vertex in the input. It combines all rings into a set of noded lines and then extracts valid polygons from that linework. An alternating even-odd strategy is used to assign areas as interior or exterior. A disadvantage is that for some relatively simple invalid geometries this produces rather complex results.

  • the ‘structure’ algorithm tries to reason from the structure of the input to find the ‘correct’ repair: exterior rings bound area, interior holes exclude area. It first makes all rings valid, then shells are merged and holes are subtracted from the shells to generate valid result. It assumes that holes and shells are correctly categorized in the input geometry.

Parameters:
  • method ({'linework', 'structure'}, default 'linework') –

    Algorithm to use when repairing geometry. ‘structure’ requires GEOS >= 3.10 and shapely >= 2.1.

    Added in version 1.1.0.

  • keep_collapsed (bool, default True) –

    For the ‘structure’ method, True will keep components that have collapsed into a lower dimensionality. For example, a ring collapsing to a line, or a line collapsing to a point. Must be True for the ‘linework’ method.

    Added in version 1.1.0.

Examples

>>> from shapely.geometry import MultiPolygon, Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (0, 2), (1, 1), (2, 2), (2, 0), (1, 1), (0, 0)]),
...         Polygon([(0, 2), (0, 1), (2, 0), (0, 0), (0, 2)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...     ],
... )
>>> s
0    POLYGON ((0 0, 0 2, 1 1, 2 2, 2 0, 1 1, 0 0))
1              POLYGON ((0 2, 0 1, 2 0, 0 0, 0 2))
2                       LINESTRING (0 0, 1 1, 1 0)
dtype: geometry
>>> s.make_valid()
0    MULTIPOLYGON (((1 1, 0 0, 0 2, 1 1)), ((2 0, 1...
1    GEOMETRYCOLLECTION (POLYGON ((2 0, 0 0, 0 1, 2...
2                           LINESTRING (0 0, 1 1, 1 0)
dtype: geometry
map(func, na_action=None, **kwargs)

Apply a function to a Dataframe elementwise.

Added in version 2.1.0: DataFrame.applymap was deprecated and renamed to DataFrame.map.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

Parameters:
  • func (callable) – Python function, returns a single value from a single value.

  • na_action ({None, 'ignore'}, default None) – If ‘ignore’, propagate NaN values, without passing them to func.

  • **kwargs – Additional keyword arguments to pass as keywords arguments to func.

Returns:

Transformed DataFrame.

Return type:

DataFrame

See also

DataFrame.apply

Apply a function along input axis of DataFrame.

DataFrame.replace

Replace values given in to_replace with value.

Series.map

Apply a function elementwise on a Series.

Examples

>>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df
       0      1
0  1.000  2.120
1  3.356  4.567
>>> df.map(lambda x: len(str(x)))
   0  1
0  3  4
1  5  5

Like Series.map, NA values can be ignored:

>>> df_copy = df.copy()
>>> df_copy.iloc[0, 0] = pd.NA
>>> df_copy.map(lambda x: len(str(x)), na_action='ignore')
     0  1
0  NaN  4
1  5.0  5

It is also possible to use map with functions that are not lambda functions:

>>> df.map(round, ndigits=1)
     0    1
0  1.0  2.1
1  3.4  4.6

Note that a vectorized version of func often exists, which will be much faster. You could square each number elementwise.

>>> df.map(lambda x: x**2)
           0          1
0   1.000000   4.494400
1  11.262736  20.857489

But it’s better to avoid map in that case.

>>> df ** 2
           0          1
0   1.000000   4.494400
1  11.262736  20.857489
mask(cond, other=<no_default>, *, inplace=False, axis=None, level=None)

Replace values where the condition is True.

Parameters:
  • cond (bool Series/DataFrame, array-like, or callable) – Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

  • other (scalar, Series/DataFrame, or callable) – Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it). If not specified, entries will be filled with the corresponding NULL value (np.nan for numpy dtypes, pd.NA for extension dtypes).

  • inplace (bool, default False) – Whether to perform the operation in place on the data.

  • axis (int, default None) – Alignment axis if needed. For Series this parameter is unused and defaults to 0.

  • level (int, default None) – Alignment level if needed.

Return type:

Same type as caller or None if ``inplace=True`.`

See also

DataFrame.where()

Return an object of same shape as self.

Notes

The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is False the element is used; otherwise the corresponding element from the DataFrame other is used. If the axis of other does not align with axis of cond Series/DataFrame, the misaligned index positions will be filled with True.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the mask documentation in indexing.

The dtype of the object takes precedence. The fill value is casted to the object’s dtype, if this can be done losslessly.

Examples

>>> s = pd.Series(range(5))
>>> s.where(s > 0)
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64
>>> s.mask(s > 0)
0    0.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
>>> s = pd.Series(range(5))
>>> t = pd.Series([True, False])
>>> s.where(t, 99)
0     0
1    99
2    99
3    99
4    99
dtype: int64
>>> s.mask(t, 99)
0    99
1     1
2    99
3    99
4    99
dtype: int64
>>> s.where(s > 1, 10)
0    10
1    10
2    2
3    3
4    4
dtype: int64
>>> s.mask(s > 1, 10)
0     0
1     1
2    10
3    10
4    10
dtype: int64
>>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9
>>> m = df % 3 == 0
>>> df.where(m, -df)
   A  B
0  0 -1
1 -2  3
2 -4 -5
3  6 -7
4 -8  9
>>> df.where(m, -df) == np.where(m, df, -df)
      A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
>>> df.where(m, -df) == df.mask(~m, -df)
      A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
max(axis=0, skipna=True, numeric_only=False, **kwargs)

Return the maximum of the values over the requested axis.

If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

Parameters:
  • axis ({index (0), columns (1)}) –

    Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

    For DataFrames, specifying axis=None will apply the aggregation across both axes.

    Added in version 2.0.0.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Return type:

Series or scalar

See also

Series.sum

Return the sum.

Series.min

Return the minimum.

Series.max

Return the maximum.

Series.idxmin

Return the index of the minimum.

Series.idxmax

Return the index of the maximum.

DataFrame.sum

Return the sum over the requested axis.

DataFrame.min

Return the minimum over the requested axis.

DataFrame.max

Return the maximum over the requested axis.

DataFrame.idxmin

Return the index of the minimum over the requested axis.

DataFrame.idxmax

Return the index of the maximum over the requested axis.

Examples

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64
>>> s.max()
8
maximum_inscribed_circle(*, tolerance=None)

Return a GeoSeries of geometries representing the largest circle that is fully contained within the input geometry.

Constructs the “maximum inscribed circle” (MIC) for a polygonal geometry, up to a specified tolerance. The MIC is determined by a point in the interior of the area which has the farthest distance from the area boundary, along with a boundary point at that distance. In the context of geography the center of the MIC is known as the “pole of inaccessibility”. A cartographic use case is to determine a suitable point to place a map label within a polygon. The radius length of the MIC is a measure of how “narrow” a polygon is. It is the distance at which the negative buffer becomes empty.

The method supports polygons with holes and multipolygons but will raise an error for any other geometry type.

Returns a GeoSeries with two-point linestrings rows, with the first point at the center of the inscribed circle and the second on the boundary of the inscribed circle.

Requires Shapely >= 2.1.

Added in version 1.1.0.

Parameters:

tolerance (float, np.array, pd.Series) – Stop the algorithm when the search area is smaller than this tolerance. When not specified, uses max(width, height) / 1000 per geometry as the default. If np.array or pd.Series are used then it must have same length as the GeoSeries.

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]),
...         Polygon([(0, 0), (10, 10), (0, 10), (0, 0)]),
...     ]
... )
>>> s
0       POLYGON ((0 0, 1 1, 0 1, 0 0))
1    POLYGON ((0 0, 10 10, 0 10, 0 0))
dtype: geometry
>>> s.maximum_inscribed_circle()
0    LINESTRING (0.29297 0.70703, 0.5 0.5)
1        LINESTRING (2.92969 7.07031, 5 5)
dtype: geometry
>>> s.maximum_inscribed_circle(tolerance=2)
0    LINESTRING (0.25 0.5, 0.375 0.375)
1          LINESTRING (2.5 7.5, 2.5 10)
dtype: geometry
mean(axis=0, skipna=True, numeric_only=False, **kwargs)

Return the mean of the values over the requested axis.

Parameters:
  • axis ({index (0), columns (1)}) –

    Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

    For DataFrames, specifying axis=None will apply the aggregation across both axes.

    Added in version 2.0.0.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.mean()
2.0

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.mean()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.mean(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.mean(numeric_only=True)
a   1.5
dtype: float64

Return type:

Series or scalar

median(axis=0, skipna=True, numeric_only=False, **kwargs)

Return the median of the values over the requested axis.

Parameters:
  • axis ({index (0), columns (1)}) –

    Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

    For DataFrames, specifying axis=None will apply the aggregation across both axes.

    Added in version 2.0.0.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.median()
2.0

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.median()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.median(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.median(numeric_only=True)
a   1.5
dtype: float64

Return type:

Series or scalar

melt(id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

Parameters:
  • id_vars (scalar, tuple, list, or ndarray, optional) – Column(s) to use as identifier variables.

  • value_vars (scalar, tuple, list, or ndarray, optional) – Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.

  • var_name (scalar, default None) – Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.

  • value_name (scalar, default 'value') – Name to use for the ‘value’ column, can’t be an existing column label.

  • col_level (scalar, optional) – If columns are a MultiIndex then use this level to melt.

  • ignore_index (bool, default True) – If True, original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.

Returns:

Unpivoted DataFrame.

Return type:

DataFrame

See also

melt

Identical method.

pivot_table

Create a spreadsheet-style pivot table as a DataFrame.

DataFrame.pivot

Return reshaped DataFrame organized by given index / column values.

DataFrame.explode

Explode a DataFrame from list-like columns to long format.

Notes

Reference the user guide for more examples.

Examples

>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
...                    'B': {0: 1, 1: 3, 2: 5},
...                    'C': {0: 2, 1: 4, 2: 6}})
>>> df
   A  B  C
0  a  1  2
1  b  3  4
2  c  5  6
>>> df.melt(id_vars=['A'], value_vars=['B'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
>>> df.melt(id_vars=['A'], value_vars=['B', 'C'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
3  a        C      2
4  b        C      4
5  c        C      6

The names of ‘variable’ and ‘value’ columns can be customized:

>>> df.melt(id_vars=['A'], value_vars=['B'],
...         var_name='myVarname', value_name='myValname')
   A myVarname  myValname
0  a         B          1
1  b         B          3
2  c         B          5

Original index values can be kept around:

>>> df.melt(id_vars=['A'], value_vars=['B', 'C'], ignore_index=False)
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
0  a        C      2
1  b        C      4
2  c        C      6

If you have multi-index columns:

>>> df.columns = [list('ABC'), list('DEF')]
>>> df
   A  B  C
   D  E  F
0  a  1  2
1  b  3  4
2  c  5  6
>>> df.melt(col_level=0, id_vars=['A'], value_vars=['B'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
>>> df.melt(id_vars=[('A', 'D')], value_vars=[('B', 'E')])
  (A, D) variable_0 variable_1  value
0      a          B          E      1
1      b          B          E      3
2      c          B          E      5
memory_usage(index=True, deep=False)

Return the memory usage of each column in bytes.

The memory usage can optionally include the contribution of the index and elements of object dtype.

This value is displayed in DataFrame.info by default. This can be suppressed by setting pandas.options.display.memory_usage to False.

Parameters:
  • index (bool, default True) – Specifies whether to include the memory usage of the DataFrame’s index in returned Series. If index=True, the memory usage of the index is the first item in the output.

  • deep (bool, default False) – If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned values.

Returns:

A Series whose index is the original column names and whose values is the memory usage of each column in bytes.

Return type:

Series

See also

numpy.ndarray.nbytes

Total bytes consumed by the elements of an ndarray.

Series.memory_usage

Bytes consumed by a Series.

Categorical

Memory-efficient array for string values with many repeated values.

DataFrame.info

Concise summary of a DataFrame.

Notes

See the Frequently Asked Questions for more details.

Examples

>>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
>>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t))
...              for t in dtypes])
>>> df = pd.DataFrame(data)
>>> df.head()
   int64  float64            complex128  object  bool
0      1      1.0              1.0+0.0j       1  True
1      1      1.0              1.0+0.0j       1  True
2      1      1.0              1.0+0.0j       1  True
3      1      1.0              1.0+0.0j       1  True
4      1      1.0              1.0+0.0j       1  True
>>> df.memory_usage()
Index           128
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64
>>> df.memory_usage(index=False)
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

The memory footprint of object dtype columns is ignored by default:

>>> df.memory_usage(deep=True)
Index            128
int64          40000
float64        40000
complex128     80000
object        180000
bool            5000
dtype: int64

Use a Categorical for efficient storage of an object-dtype column with many repeated values.

>>> df['object'].astype('category').memory_usage(deep=True)
5244
merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=None, indicator=False, validate=None)

Merge DataFrame or named Series objects with a database-style join.

A named Series object is treated as a DataFrame with a single named column.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

Warning

If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.

Parameters:
  • right (DataFrame or named Series) – Object to merge with.

  • how ({'left', 'right', 'outer', 'inner', 'cross'}, default 'inner') –

    Type of merge to be performed.

    • left: use only keys from left frame, similar to a SQL left outer join; preserve key order.

    • right: use only keys from right frame, similar to a SQL right outer join; preserve key order.

    • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.

    • inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.

    • cross: creates the cartesian product from both frames, preserves the order of the left keys.

  • on (label or list) – Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

  • left_on (label or list, or array-like) – Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns.

  • right_on (label or list, or array-like) – Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns.

  • left_index (bool, default False) – Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels.

  • right_index (bool, default False) – Use the index from the right DataFrame as the join key. Same caveats as left_index.

  • sort (bool, default False) – Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword).

  • suffixes (list-like, default is (``”_x”, ``"_y")) – A length-2 sequence where each element is optionally a string indicating the suffix to add to overlapping column names in left and right respectively. Pass a value of None instead of a string to indicate that the column name from left or right should be left as-is, with no suffix. At least one of the values must not be None.

  • copy (bool, default True) –

    If False, avoid copy if possible.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

  • indicator (bool or str, default False) – If True, adds a column to the output DataFrame called “_merge” with information on the source of each row. The column can be given a different name by providing a string argument. The column will have a Categorical type with the value of “left_only” for observations whose merge key only appears in the left DataFrame, “right_only” for observations whose merge key only appears in the right DataFrame, and “both” if the observation’s merge key is found in both DataFrames.

  • validate (str, optional) –

    If specified, checks if merge is of specified type.

    • ”one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets.

    • ”one_to_many” or “1:m”: check if merge keys are unique in left dataset.

    • ”many_to_one” or “m:1”: check if merge keys are unique in right dataset.

    • ”many_to_many” or “m:m”: allowed, but does not result in checks.

Returns:

A DataFrame of the two merged objects.

Return type:

DataFrame

See also

merge_ordered

Merge with optional filling/interpolation.

merge_asof

Merge on nearest keys.

DataFrame.join

Similar method using indices.

Examples

>>> df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [1, 2, 3, 5]})
>>> df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [5, 6, 7, 8]})
>>> df1
    lkey value
0   foo      1
1   bar      2
2   baz      3
3   foo      5
>>> df2
    rkey value
0   foo      5
1   bar      6
2   baz      7
3   foo      8

Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.

>>> df1.merge(df2, left_on='lkey', right_on='rkey')
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  bar        2  bar        6
3  baz        3  baz        7
4  foo        5  foo        5
5  foo        5  foo        8

Merge DataFrames df1 and df2 with specified left and right suffixes appended to any overlapping columns.

>>> df1.merge(df2, left_on='lkey', right_on='rkey',
...           suffixes=('_left', '_right'))
  lkey  value_left rkey  value_right
0  foo           1  foo            5
1  foo           1  foo            8
2  bar           2  bar            6
3  baz           3  baz            7
4  foo           5  foo            5
5  foo           5  foo            8

Merge DataFrames df1 and df2, but raise an exception if the DataFrames have any overlapping columns.

>>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False))
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
    Index(['value'], dtype='object')
>>> df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
>>> df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
>>> df1
      a  b
0   foo  1
1   bar  2
>>> df2
      a  c
0   foo  3
1   baz  4
>>> df1.merge(df2, how='inner', on='a')
      a  b  c
0   foo  1  3
>>> df1.merge(df2, how='left', on='a')
      a  b  c
0   foo  1  3.0
1   bar  2  NaN
>>> df1 = pd.DataFrame({'left': ['foo', 'bar']})
>>> df2 = pd.DataFrame({'right': [7, 8]})
>>> df1
    left
0   foo
1   bar
>>> df2
    right
0   7
1   8
>>> df1.merge(df2, how='cross')
   left  right
0   foo      7
1   foo      8
2   bar      7
3   bar      8
min(axis=0, skipna=True, numeric_only=False, **kwargs)

Return the minimum of the values over the requested axis.

If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

Parameters:
  • axis ({index (0), columns (1)}) –

    Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

    For DataFrames, specifying axis=None will apply the aggregation across both axes.

    Added in version 2.0.0.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

  • **kwargs – Additional keyword arguments to be passed to the function.

Return type:

Series or scalar

See also

Series.sum

Return the sum.

Series.min

Return the minimum.

Series.max

Return the maximum.

Series.idxmin

Return the index of the minimum.

Series.idxmax

Return the index of the maximum.

DataFrame.sum

Return the sum over the requested axis.

DataFrame.min

Return the minimum over the requested axis.

DataFrame.max

Return the maximum over the requested axis.

DataFrame.idxmin

Return the index of the minimum over the requested axis.

DataFrame.idxmax

Return the index of the maximum over the requested axis.

Examples

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64
>>> s.min()
0
minimum_bounding_circle()

Return a GeoSeries of geometries representing the minimum bounding circle that encloses each geometry.

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
>>> s.minimum_bounding_circle()
0    POLYGON ((1.20711 0.5, 1.19352 0.36205, 1.1532...
1    POLYGON ((1.20711 0.5, 1.19352 0.36205, 1.1532...
2                                          POINT (0 0)
dtype: geometry

See also

GeoSeries.convex_hull

convex hull geometry

GeoSeries.maximum_inscribed_circle

the largest circle within the geometry

minimum_bounding_radius()

Return a Series of the radii of the minimum bounding circles that enclose each geometry.

Examples

>>> from shapely.geometry import Point, LineString, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0,0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
>>> s.minimum_bounding_radius()
0    0.707107
1    0.707107
2    0.000000
dtype: float64

See also

GeoSeries.minumum_bounding_circle

minimum bounding circle (geometry)

minimum_clearance()

Return a Series containing the minimum clearance distance, which is the smallest distance by which a vertex of the geometry could be moved to produce an invalid geometry.

If no minimum clearance exists for a geometry (for example, a single point, or an empty geometry), infinity is returned.

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]),
...         LineString([(0, 0), (1, 1), (3, 2)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 3 2)
2                       POINT (0 0)
dtype: geometry
>>> s.minimum_clearance()
0    0.707107
1    1.414214
2         inf
dtype: float64
minimum_clearance_line()

Return a GeoSeries of linestrings whose endpoints define the minimum clearance.

A geometry’s “minimum clearance” is the smallest distance by which a vertex of the geometry could be moved to produce an invalid geometry.

If the geometry has no minimum clearance, an empty LineString will be returned.

Requires Shapely >= 2.1.

Added in version 1.1.0.

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]),
...         LineString([(0, 0), (1, 1), (3, 2)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 3 2)
2                       POINT (0 0)
dtype: geometry
>>> s.minimum_clearance_line()
0    LINESTRING (0 1, 0.5 0.5)
1        LINESTRING (0 0, 1 1)
2             LINESTRING EMPTY
dtype: geometry
minimum_rotated_rectangle()

Return a GeoSeries of the general minimum bounding rectangle that contains the object.

Unlike envelope this rectangle is not constrained to be parallel to the coordinate axes. If the convex hull of the object is a degenerate (line or point) this degenerate is returned.

Examples

>>> from shapely.geometry import Polygon, LineString, Point, MultiPoint
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         MultiPoint([(0, 0), (1, 1)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2         MULTIPOINT ((0 0), (1 1))
3                       POINT (0 0)
dtype: geometry
>>> s.minimum_rotated_rectangle()
0    POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))
1    POLYGON ((1 1, 1 0, 0 0, 0 1, 1 1))
2                  LINESTRING (0 0, 1 1)
3                            POINT (0 0)
dtype: geometry

See also

GeoSeries.envelope

bounding rectangle

mod(other, axis='columns', level=None, fill_value=None)

Get Modulo of dataframe and other, element-wise (binary operator mod).

Equivalent to dataframe % other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
mode(axis=0, numeric_only=False, dropna=True)

Get the mode(s) of each element along the selected axis.

The mode of a set of values is the value that appears most often. It can be multiple values.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    The axis to iterate over while searching for the mode:

    • 0 or ‘index’ : get mode of each column

    • 1 or ‘columns’ : get mode of each row.

  • numeric_only (bool, default False) – If True, only apply to numeric columns.

  • dropna (bool, default True) – Don’t consider counts of NaN/NaT.

Returns:

The modes of each column or row.

Return type:

DataFrame

See also

Series.mode

Return the highest frequency value in a Series.

Series.value_counts

Return the counts of values in a Series.

Examples

>>> df = pd.DataFrame([('bird', 2, 2),
...                    ('mammal', 4, np.nan),
...                    ('arthropod', 8, 0),
...                    ('bird', 2, np.nan)],
...                   index=('falcon', 'horse', 'spider', 'ostrich'),
...                   columns=('species', 'legs', 'wings'))
>>> df
           species  legs  wings
falcon        bird     2    2.0
horse       mammal     4    NaN
spider   arthropod     8    0.0
ostrich       bird     2    NaN

By default, missing values are not considered, and the mode of wings are both 0 and 2. Because the resulting DataFrame has two rows, the second row of species and legs contains NaN.

>>> df.mode()
  species  legs  wings
0    bird   2.0    0.0
1     NaN   NaN    2.0

Setting dropna=False NaN values are considered and they can be the mode (like for wings).

>>> df.mode(dropna=False)
  species  legs  wings
0    bird     2    NaN

Setting numeric_only=True, only the mode of numeric columns is computed, and columns of other types are ignored.

>>> df.mode(numeric_only=True)
   legs  wings
0   2.0    0.0
1   NaN    2.0

To compute the mode over columns and not rows, use the axis parameter:

>>> df.mode(axis='columns', numeric_only=True)
           0    1
falcon   2.0  NaN
horse    4.0  NaN
spider   0.0  8.0
ostrich  2.0  NaN
mul(other, axis='columns', level=None, fill_value=None)

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
multiply(other, axis='columns', level=None, fill_value=None)

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
property ndim: int

Return an int representing the number of axes / array dimensions.

Return 1 if Series. Otherwise return 2 if DataFrame.

See also

ndarray.ndim

Number of array dimensions.

Examples

>>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.ndim
1
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.ndim
2
ne(other, axis='columns', level=None)

Get Not equal to of dataframe and other, element-wise (binary operator ne).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:
  • other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq

Compare DataFrames for equality elementwise.

DataFrame.ne

Compare DataFrames for inequality elementwise.

DataFrame.le

Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt

Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge

Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt

Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150
>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
nlargest(n, columns, keep='first')

Return the first n rows ordered by columns in descending order.

Return the first n rows with the largest values in columns, in descending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=False).head(n), but more performant.

Parameters:
  • n (int) – Number of rows to return.

  • columns (label or list of labels) – Column label(s) to order by.

  • keep ({'first', 'last', 'all'}, default 'first') –

    Where there are duplicate values:

    • first : prioritize the first occurrence(s)

    • last : prioritize the last occurrence(s)

    • all : keep all the ties of the smallest item even if it means selecting more than n items.

Returns:

The first n rows ordered by the given columns in descending order.

Return type:

DataFrame

See also

DataFrame.nsmallest

Return the first n rows ordered by columns in ascending order.

DataFrame.sort_values

Sort DataFrame by the values.

DataFrame.head

Return the first n rows without re-ordering.

Notes

This function cannot be used with all column types. For example, when specifying columns with object or category dtypes, TypeError is raised.

Examples

>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 11300,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
          population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru          11300      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI

In the following example, we will use nlargest to select the three rows having the largest values in column “population”.

>>> df.nlargest(3, 'population')
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Malta       434000    12011      MT

When using keep='last', ties are resolved in reverse order:

>>> df.nlargest(3, 'population', keep='last')
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Brunei      434000    12128      BN

When using keep='all', the number of element kept can go beyond n if there are duplicate values for the smallest element, all the ties are kept:

>>> df.nlargest(3, 'population', keep='all')
          population      GDP alpha-2
France      65000000  2583560      FR
Italy       59000000  1937894      IT
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN

However, nlargest does not keep n distinct largest elements:

>>> df.nlargest(5, 'population', keep='all')
          population      GDP alpha-2
France      65000000  2583560      FR
Italy       59000000  1937894      IT
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN

To order by the largest values in column “population” and then “GDP”, we can specify multiple columns like in the next example.

>>> df.nlargest(3, ['population', 'GDP'])
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Brunei      434000    12128      BN
normalize()

Return a GeoSeries of normalized geometries to normal form (or canonical form).

This method orders the coordinates, rings of a polygon and parts of multi geometries consistently. Typically useful for testing purposes (for example in combination with equals_exact).

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0, 0),
...     ],
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
>>> s.normalize()
0    POLYGON ((0 0, 0 1, 1 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
notna()

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns:

Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

Return type:

DataFrame

See also

DataFrame.notnull

Alias of notna.

DataFrame.isna

Boolean inverse of notna.

DataFrame.dropna

Omit axes labels with missing values.

notna

Top-level notna.

Examples

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
dtype: bool
notnull()

DataFrame.notnull is an alias for DataFrame.notna.

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns:

Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

Return type:

DataFrame

See also

DataFrame.notnull

Alias of notna.

DataFrame.isna

Boolean inverse of notna.

DataFrame.dropna

Omit axes labels with missing values.

notna

Top-level notna.

Examples

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
dtype: bool
nsmallest(n, columns, keep='first')

Return the first n rows ordered by columns in ascending order.

Return the first n rows with the smallest values in columns, in ascending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=True).head(n), but more performant.

Parameters:
  • n (int) – Number of items to retrieve.

  • columns (list or str) – Column name or names to order by.

  • keep ({'first', 'last', 'all'}, default 'first') –

    Where there are duplicate values:

    • first : take the first occurrence.

    • last : take the last occurrence.

    • all : keep all the ties of the largest item even if it means selecting more than n items.

Return type:

DataFrame

See also

DataFrame.nlargest

Return the first n rows ordered by columns in descending order.

DataFrame.sort_values

Sort DataFrame by the values.

DataFrame.head

Return the first n rows without re-ordering.

Examples

>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 337000,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
          population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru         337000      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI

In the following example, we will use nsmallest to select the three rows having the smallest values in column “population”.

>>> df.nsmallest(3, 'population')
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS

When using keep='last', ties are resolved in reverse order:

>>> df.nsmallest(3, 'population', keep='last')
          population  GDP alpha-2
Anguilla       11300  311      AI
Tuvalu         11300   38      TV
Nauru         337000  182      NR

When using keep='all', the number of element kept can go beyond n if there are duplicate values for the largest element, all the ties are kept.

>>> df.nsmallest(3, 'population', keep='all')
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS
Nauru         337000    182      NR

However, nsmallest does not keep n distinct smallest elements:

>>> df.nsmallest(4, 'population', keep='all')
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS
Nauru         337000    182      NR

To order by the smallest values in column “population” and then “GDP”, we can specify multiple columns like in the next example.

>>> df.nsmallest(3, ['population', 'GDP'])
          population  GDP alpha-2
Tuvalu         11300   38      TV
Anguilla       11300  311      AI
Nauru         337000  182      NR
nunique(axis=0, dropna=True)

Count number of distinct elements in specified axis.

Return Series with number of distinct elements. Can ignore NaN values.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

  • dropna (bool, default True) – Don’t include NaN in the counts.

Return type:

Series

See also

Series.nunique

Method nunique for Series.

DataFrame.count

Count non-NA cells for each column or row.

Examples

>>> df = pd.DataFrame({'A': [4, 5, 6], 'B': [4, 1, 1]})
>>> df.nunique()
A    3
B    2
dtype: int64
>>> df.nunique(axis=1)
0    1
1    2
2    2
dtype: int64
offset_curve(distance, quad_segs=8, join_style='round', mitre_limit=5.0)

Return a LineString or MultiLineString geometry at a distance from the object on its right or its left side.

Parameters:
  • distance (float | array-like) – Specifies the offset distance from the input geometry. Negative for right side offset, positive for left side offset.

  • quad_segs (int (optional, default 8)) – Specifies the number of linear segments in a quarter circle in the approximation of circular arcs.

  • join_style ({'round', 'bevel', 'mitre'}, (optional, default 'round')) – Specifies the shape of outside corners. ‘round’ results in rounded shapes. ‘bevel’ results in a beveled edge that touches the original vertex. ‘mitre’ results in a single vertex that is beveled depending on the mitre_limit parameter.

  • mitre_limit (float (optional, default 5.0)) – Crops of ‘mitre’-style joins if the point is displaced from the buffered vertex by more than this limit.

  • http (See)

  • details. (for)

Examples

>>> from shapely.geometry import LineString
>>> s = geopandas.GeoSeries(
...     [
...         LineString([(0, 0), (0, 1), (1, 1)]),
...     ],
...     crs=3857
... )
>>> s
0    LINESTRING (0 0, 0 1, 1 1)
dtype: geometry
>>> s.offset_curve(1)
0    LINESTRING (-1 0, -1 1, -0.981 1.195, -0.924 1...
dtype: geometry
orient_polygons(*, exterior_cw=False)

Return a GeoSeries of geometries with enforced ring orientation.

Enforce a ring orientation on all polygonal elements in the GeoSeries.

Forces (Multi)Polygons to use a counter-clockwise orientation for their exterior ring, and a clockwise orientation for their interior rings (or the oppposite if exterior_cw=True).

Also processes geometries inside a GeometryCollection in the same way. Other geometries are returned unchanged.

Requires Shapely >= 2.1.

Added in version 1.1.0.

Parameters:

exterior_cw (bool) – If True, exterior rings will be clockwise and interior rings will be counter-clockwise.

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon(
...             [(0, 0), (0, 10), (10, 10), (10, 0), (0, 0)],
...             holes=[[(2, 2), (2, 4), (4, 4), (4, 2), (2, 2)]],
...     ),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0, 0),
...     ],
... )
>>> s
0    POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0), (2 2, ...
1                           LINESTRING (0 0, 1 1, 1 0)
2                                          POINT (0 0)
dtype: geometry
>>> s.orient_polygons()
0    POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0), (2 2, ...
1                           LINESTRING (0 0, 1 1, 1 0)
2                                          POINT (0 0)
dtype: geometry
>>> s.orient_polygons(exterior_cw=True)
0    POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0), (2 2, ...
1                           LINESTRING (0 0, 1 1, 1 0)
2                                          POINT (0 0)
dtype: geometry
overlaps(other, align=None)

Return True for all aligned geometries that overlap other, else False.

Geometries overlaps if they have more than one but not all points in common, have the same dimension, and the intersection of the interiors of the geometries has the same dimension as the geometries themselves.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if overlaps.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, MultiPoint, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         MultiPoint([(0, 0), (0, 1)]),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 0), (0, 2)]),
...         LineString([(0, 1), (1, 1)]),
...         LineString([(1, 1), (3, 3)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3         MULTIPOINT ((0 0), (0 1))
dtype: geometry
>>> s2
1    POLYGON ((0 0, 2 0, 0 2, 0 0))
2             LINESTRING (0 1, 1 1)
3             LINESTRING (1 1, 3 3)
4                       POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries overlaps a single geometry:

../../../_static/binary_op-03.svg
>>> polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
>>> s.overlaps(polygon)
0     True
1     True
2    False
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.overlaps(s2)
0    False
1     True
2    False
3    False
4    False
dtype: bool
>>> s.overlaps(s2, align=False)
0     True
1    False
2     True
3    False
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries overlaps any element of the other one.

See also

GeoSeries.crosses, GeoSeries.intersects

overlay(right, how='intersection', keep_geom_type=None, make_valid=True)[source]

Perform spatial overlay between GeoDataFrames.

Currently only supports data GeoDataFrames with uniform geometry types, i.e. containing only (Multi)Polygons, or only (Multi)Points, or a combination of (Multi)LineString and LinearRing shapes. Implements several methods that are all effectively subsets of the union.

See the User Guide page ../../user_guide/set_operations for details.

Parameters:
  • right (GeoDataFrame)

  • how (string) – Method of spatial overlay: ‘intersection’, ‘union’, ‘identity’, ‘symmetric_difference’ or ‘difference’.

  • keep_geom_type (bool) – If True, return only geometries of the same geometry type the GeoDataFrame has, if False, return all resulting geometries. Default is None, which will set keep_geom_type to True but warn upon dropping geometries.

  • make_valid (bool, default True) – If True, any invalid input geometries are corrected with a call to make_valid(), if False, a ValueError is raised if any input geometries are invalid.

Returns:

df – GeoDataFrame with new set of polygons and attributes resulting from the overlay

Return type:

GeoDataFrame

Examples

>>> from shapely.geometry import Polygon
>>> polys1 = geopandas.GeoSeries([Polygon([(0,0), (2,0), (2,2), (0,2)]),
...                               Polygon([(2,2), (4,2), (4,4), (2,4)])])
>>> polys2 = geopandas.GeoSeries([Polygon([(1,1), (3,1), (3,3), (1,3)]),
...                               Polygon([(3,3), (5,3), (5,5), (3,5)])])
>>> df1 = geopandas.GeoDataFrame({'geometry': polys1, 'df1_data':[1,2]})
>>> df2 = geopandas.GeoDataFrame({'geometry': polys2, 'df2_data':[1,2]})
>>> df1.overlay(df2, how='union')
   df1_data  df2_data                                           geometry
0       1.0       1.0                POLYGON ((2 2, 2 1, 1 1, 1 2, 2 2))
1       2.0       1.0                POLYGON ((2 2, 2 3, 3 3, 3 2, 2 2))
2       2.0       2.0                POLYGON ((4 4, 4 3, 3 3, 3 4, 4 4))
3       1.0       NaN      POLYGON ((2 0, 0 0, 0 2, 1 2, 1 1, 2 1, 2 0))
4       2.0       NaN  MULTIPOLYGON (((3 4, 3 3, 2 3, 2 4, 3 4)), ((4...
5       NaN       1.0  MULTIPOLYGON (((2 3, 2 2, 1 2, 1 3, 2 3)), ((3...
6       NaN       2.0      POLYGON ((3 5, 5 5, 5 3, 4 3, 4 4, 3 4, 3 5))
>>> df1.overlay(df2, how='intersection')
   df1_data  df2_data                             geometry
0         1         1  POLYGON ((2 2, 2 1, 1 1, 1 2, 2 2))
1         2         1  POLYGON ((2 2, 2 3, 3 3, 3 2, 2 2))
2         2         2  POLYGON ((4 4, 4 3, 3 3, 3 4, 4 4))
>>> df1.overlay(df2, how='symmetric_difference')
   df1_data  df2_data                                           geometry
0       1.0       NaN      POLYGON ((2 0, 0 0, 0 2, 1 2, 1 1, 2 1, 2 0))
1       2.0       NaN  MULTIPOLYGON (((3 4, 3 3, 2 3, 2 4, 3 4)), ((4...
2       NaN       1.0  MULTIPOLYGON (((2 3, 2 2, 1 2, 1 3, 2 3)), ((3...
3       NaN       2.0      POLYGON ((3 5, 5 5, 5 3, 4 3, 4 4, 3 4, 3 5))
>>> df1.overlay(df2, how='difference')
                                            geometry  df1_data
0      POLYGON ((2 0, 0 0, 0 2, 1 2, 1 1, 2 1, 2 0))         1
1  MULTIPOLYGON (((3 4, 3 3, 2 3, 2 4, 3 4)), ((4...         2
>>> df1.overlay(df2, how='identity')
   df1_data  df2_data                                           geometry
0         1       1.0                POLYGON ((2 2, 2 1, 1 1, 1 2, 2 2))
1         2       1.0                POLYGON ((2 2, 2 3, 3 3, 3 2, 2 2))
2         2       2.0                POLYGON ((4 4, 4 3, 3 3, 3 4, 4 4))
3         1       NaN      POLYGON ((2 0, 0 0, 0 2, 1 2, 1 1, 2 1, 2 0))
4         2       NaN  MULTIPOLYGON (((3 4, 3 3, 2 3, 2 4, 3 4)), ((4...

See also

GeoDataFrame.sjoin

spatial join

overlay

equivalent top-level function

Notes

Every operation in GeoPandas is planar, i.e. the potential third dimension is not taken into account.

pad(*, axis=None, inplace=False, limit=None, downcast=<no_default>)

Fill NA/NaN values by propagating the last valid observation to next valid.

Deprecated since version 2.0: Series/DataFrame.pad is deprecated. Use Series/DataFrame.ffill instead.

Returns:

Object with missing values filled or None if inplace=True.

Return type:

Series/DataFrame or None

Parameters:
  • axis (None | Axis)

  • inplace (bool_t)

  • limit (None | int)

  • downcast (dict | None | lib.NoDefault)

Examples

Please see examples for DataFrame.ffill() or Series.ffill().

pct_change(periods=1, fill_method=<no_default>, limit=<no_default>, freq=None, **kwargs)

Fractional change between the current and a prior element.

Computes the fractional change from the immediately previous row by default. This is useful in comparing the fraction of change in a time series of elements.

Note

Despite the name of this method, it calculates fractional change (also known as per unit change or relative change) and not percentage change. If you need the percentage change, multiply these values by 100.

Parameters:
  • periods (int, default 1) – Periods to shift for forming percent change.

  • fill_method ({'backfill', 'bfill', 'pad', 'ffill', None}, default 'pad') –

    How to handle NAs before computing percent changes.

    Deprecated since version 2.1: All options of fill_method are deprecated except fill_method=None.

  • limit (int, default None) –

    The number of consecutive NAs to fill before stopping.

    Deprecated since version 2.1.

  • freq (DateOffset, timedelta, or str, optional) – Increment to use from time series API (e.g. ‘ME’ or BDay()).

  • **kwargs – Additional keyword arguments are passed into DataFrame.shift or Series.shift.

Returns:

The same type as the calling object.

Return type:

Series or DataFrame

See also

Series.diff

Compute the difference of two elements in a Series.

DataFrame.diff

Compute the difference of two elements in a DataFrame.

Series.shift

Shift the index by some number of periods.

DataFrame.shift

Shift the index by some number of periods.

Examples

Series

>>> s = pd.Series([90, 91, 85])
>>> s
0    90
1    91
2    85
dtype: int64
>>> s.pct_change()
0         NaN
1    0.011111
2   -0.065934
dtype: float64
>>> s.pct_change(periods=2)
0         NaN
1         NaN
2   -0.055556
dtype: float64

See the percentage change in a Series where filling NAs with last valid observation forward to next valid.

>>> s = pd.Series([90, 91, None, 85])
>>> s
0    90.0
1    91.0
2     NaN
3    85.0
dtype: float64
>>> s.ffill().pct_change()
0         NaN
1    0.011111
2    0.000000
3   -0.065934
dtype: float64

DataFrame

Percentage change in French franc, Deutsche Mark, and Italian lira from 1980-01-01 to 1980-03-01.

>>> df = pd.DataFrame({
...     'FR': [4.0405, 4.0963, 4.3149],
...     'GR': [1.7246, 1.7482, 1.8519],
...     'IT': [804.74, 810.01, 860.13]},
...     index=['1980-01-01', '1980-02-01', '1980-03-01'])
>>> df
                FR      GR      IT
1980-01-01  4.0405  1.7246  804.74
1980-02-01  4.0963  1.7482  810.01
1980-03-01  4.3149  1.8519  860.13
>>> df.pct_change()
                  FR        GR        IT
1980-01-01       NaN       NaN       NaN
1980-02-01  0.013810  0.013684  0.006549
1980-03-01  0.053365  0.059318  0.061876

Percentage of change in GOOG and APPL stock volume. Shows computing the percentage change between columns.

>>> df = pd.DataFrame({
...     '2016': [1769950, 30586265],
...     '2015': [1500923, 40912316],
...     '2014': [1371819, 41403351]},
...     index=['GOOG', 'APPL'])
>>> df
          2016      2015      2014
GOOG   1769950   1500923   1371819
APPL  30586265  40912316  41403351
>>> df.pct_change(axis='columns', periods=-1)
          2016      2015  2014
GOOG  0.179241  0.094112   NaN
APPL -0.252395 -0.011860   NaN
pipe(func, *args, **kwargs)

Apply chainable functions that expect Series or DataFrames.

Parameters:
  • func (function) – Function to apply to the Series/DataFrame. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame.

  • *args (iterable, optional) – Positional arguments passed into func.

  • **kwargs (mapping, optional) – A dictionary of keyword arguments passed into func.

Return type:

the return type of func.

See also

DataFrame.apply

Apply a function along input axis of DataFrame.

DataFrame.map

Apply a function elementwise on a whole DataFrame.

Series.map

Apply a mapping correspondence on a Series.

Notes

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects.

Examples

Constructing a income DataFrame from a dictionary.

>>> data = [[8000, 1000], [9500, np.nan], [5000, 2000]]
>>> df = pd.DataFrame(data, columns=['Salary', 'Others'])
>>> df
   Salary  Others
0    8000  1000.0
1    9500     NaN
2    5000  2000.0

Functions that perform tax reductions on an income DataFrame.

>>> def subtract_federal_tax(df):
...     return df * 0.9
>>> def subtract_state_tax(df, rate):
...     return df * (1 - rate)
>>> def subtract_national_insurance(df, rate, rate_increase):
...     new_rate = rate + rate_increase
...     return df * (1 - new_rate)

Instead of writing

>>> subtract_national_insurance(
...     subtract_state_tax(subtract_federal_tax(df), rate=0.12),
...     rate=0.05,
...     rate_increase=0.02)

You can write

>>> (
...     df.pipe(subtract_federal_tax)
...     .pipe(subtract_state_tax, rate=0.12)
...     .pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02)
... )
    Salary   Others
0  5892.48   736.56
1  6997.32      NaN
2  3682.80  1473.12

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose national_insurance takes its data as df in the second argument:

>>> def subtract_national_insurance(rate, df, rate_increase):
...     new_rate = rate + rate_increase
...     return df * (1 - new_rate)
>>> (
...     df.pipe(subtract_federal_tax)
...     .pipe(subtract_state_tax, rate=0.12)
...     .pipe(
...         (subtract_national_insurance, 'df'),
...         rate=0.05,
...         rate_increase=0.02
...     )
... )
    Salary   Others
0  5892.48   736.56
1  6997.32      NaN
2  3682.80  1473.12
pivot(*, columns, index=<no_default>, values=<no_default>)

Return reshaped DataFrame organized by given index / column values.

Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. See the User Guide for more on reshaping.

Parameters:
  • columns (str or object or a list of str) – Column to use to make new frame’s columns.

  • index (str or object or a list of str, optional) – Column to use to make new frame’s index. If not given, uses existing index.

  • values (str, object or a list of the previous, optional) – Column(s) to use for populating new frame’s values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.

Returns:

Returns reshaped DataFrame.

Return type:

DataFrame

Raises:

ValueError: – When there are any index, columns combinations with multiple values. DataFrame.pivot_table when you need to aggregate.

See also

DataFrame.pivot_table

Generalization of pivot that can handle duplicate values for one index/column pair.

DataFrame.unstack

Pivot based on the index values instead of a column.

wide_to_long

Wide panel to long format. Less flexible but more user-friendly than melt.

Notes

For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods.

Reference the user guide for more examples.

Examples

>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
...                            'two'],
...                    'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
...                    'baz': [1, 2, 3, 4, 5, 6],
...                    'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
    foo   bar  baz  zoo
0   one   A    1    x
1   one   B    2    y
2   one   C    3    z
3   two   A    4    q
4   two   B    5    w
5   two   C    6    t
>>> df.pivot(index='foo', columns='bar', values='baz')
bar  A   B   C
foo
one  1   2   3
two  4   5   6
>>> df.pivot(index='foo', columns='bar')['baz']
bar  A   B   C
foo
one  1   2   3
two  4   5   6
>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
      baz       zoo
bar   A  B  C   A  B  C
foo
one   1  2  3   x  y  z
two   4  5  6   q  w  t

You could also assign a list of column names or a list of index names.

>>> df = pd.DataFrame({
...        "lev1": [1, 1, 1, 2, 2, 2],
...        "lev2": [1, 1, 2, 1, 1, 2],
...        "lev3": [1, 2, 1, 2, 1, 2],
...        "lev4": [1, 2, 3, 4, 5, 6],
...        "values": [0, 1, 2, 3, 4, 5]})
>>> df
    lev1 lev2 lev3 lev4 values
0   1    1    1    1    0
1   1    1    2    2    1
2   1    2    1    3    2
3   2    1    2    4    3
4   2    1    1    5    4
5   2    2    2    6    5
>>> df.pivot(index="lev1", columns=["lev2", "lev3"], values="values")
lev2    1         2
lev3    1    2    1    2
lev1
1     0.0  1.0  2.0  NaN
2     4.0  3.0  NaN  5.0
>>> df.pivot(index=["lev1", "lev2"], columns=["lev3"], values="values")
      lev3    1    2
lev1  lev2
   1     1  0.0  1.0
         2  2.0  NaN
   2     1  4.0  3.0
         2  NaN  5.0

A ValueError is raised if there are any duplicates.

>>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
...                    "bar": ['A', 'A', 'B', 'C'],
...                    "baz": [1, 2, 3, 4]})
>>> df
   foo bar  baz
0  one   A    1
1  one   A    2
2  two   B    3
3  two   C    4

Notice that the first two rows are the same for our index and columns arguments.

>>> df.pivot(index='foo', columns='bar', values='baz')
Traceback (most recent call last):
   ...
ValueError: Index contains duplicate entries, cannot reshape
pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=<no_default>, sort=True)

Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

Parameters:
  • values (list-like or scalar, optional) – Column or columns to aggregate.

  • index (column, Grouper, array, or list of the previous) – Keys to group by on the pivot table index. If a list is passed, it can contain any of the other types (except list). If an array is passed, it must be the same length as the data and will be used in the same manner as column values.

  • columns (column, Grouper, array, or list of the previous) – Keys to group by on the pivot table column. If a list is passed, it can contain any of the other types (except list). If an array is passed, it must be the same length as the data and will be used in the same manner as column values.

  • aggfunc (function, list of functions, dict, default "mean") – If a list of functions is passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves). If a dict is passed, the key is column to aggregate and the value is function or list of functions. If margin=True, aggfunc will be used to calculate the partial aggregates.

  • fill_value (scalar, default None) – Value to replace missing values with (in the resulting pivot table, after aggregation).

  • margins (bool, default False) – If margins=True, special All columns and rows will be added with partial group aggregates across the categories on the rows and columns.

  • dropna (bool, default True) – Do not include columns whose entries are all NaN. If True, rows with a NaN value in any column will be omitted before computing margins.

  • margins_name (str, default 'All') – Name of the row / column that will contain the totals when margins is True.

  • observed (bool, default False) –

    This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

    Deprecated since version 2.2.0: The default value of False is deprecated and will change to True in a future version of pandas.

  • sort (bool, default True) –

    Specifies if the result should be sorted.

    Added in version 1.3.0.

Returns:

An Excel style pivot table.

Return type:

DataFrame

See also

DataFrame.pivot

Pivot without aggregation that can handle non-numeric data.

DataFrame.melt

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

wide_to_long

Wide panel to long format. Less flexible but more user-friendly than melt.

Notes

Reference the user guide for more examples.

Examples

>>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
...                          "bar", "bar", "bar", "bar"],
...                    "B": ["one", "one", "one", "two", "two",
...                          "one", "one", "two", "two"],
...                    "C": ["small", "large", "large", "small",
...                          "small", "large", "small", "small",
...                          "large"],
...                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> df
     A    B      C  D  E
0  foo  one  small  1  2
1  foo  one  large  2  4
2  foo  one  large  2  5
3  foo  two  small  3  5
4  foo  two  small  3  6
5  bar  one  large  4  6
6  bar  one  small  5  8
7  bar  two  small  6  9
8  bar  two  large  7  9

This first example aggregates values by taking the sum.

>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
...                        columns=['C'], aggfunc="sum")
>>> table
C        large  small
A   B
bar one    4.0    5.0
    two    7.0    6.0
foo one    4.0    1.0
    two    NaN    6.0

We can also fill missing values using the fill_value parameter.

>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
...                        columns=['C'], aggfunc="sum", fill_value=0)
>>> table
C        large  small
A   B
bar one      4      5
    two      7      6
foo one      4      1
    two      0      6

The next example aggregates by taking the mean across multiple columns.

>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                        aggfunc={'D': "mean", 'E': "mean"})
>>> table
                D         E
A   C
bar large  5.500000  7.500000
    small  5.500000  8.500000
foo large  2.000000  4.500000
    small  2.333333  4.333333

We can also calculate multiple types of aggregations for any given value column.

>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                        aggfunc={'D': "mean",
...                                 'E': ["min", "max", "mean"]})
>>> table
                  D   E
               mean max      mean  min
A   C
bar large  5.500000   9  7.500000    6
    small  5.500000   9  8.500000    8
foo large  2.000000   5  4.500000    4
    small  2.333333   6  4.333333    2
plot

alias of GeoplotAccessor

polygonize(node=True, full=False)

Create polygons formed from the linework of a GeoSeries.

Polygonizes the GeoSeries that contain linework which represents the edges of a planar graph. Any geometry type may be provided as input; only the constituent lines and rings will be used to create the output polygons.

Lines or rings that when combined do not completely close a polygon will be ignored. Duplicate segments are ignored.

Unless you know that the input GeoSeries represents a planar graph with a clean topology (e.g. there is a node on both lines where they intersect), it is recommended to use node=True which performs noding prior to polygonization. Using node=False will provide performance benefits but may result in incorrect polygons if the input is not of the proper topology.

When full=True, the return value is a 4-tuple containing output polygons, along with lines which could not be converted to polygons. The return value consists of 4 elements or varying lenghts:

  • GeoSeries of the valid polygons (same as with full=False)

  • GeoSeries of cut edges: edges connected on both ends but not part of polygonal output

  • GeoSeries of dangles: edges connected on one end but not part of polygonal output

  • GeoSeries of invalid rings: polygons that are formed but are not valid (bowties, etc)

Parameters:
  • node (bool, default True) – Perform noding prior to polygonization, by default True.

  • full (bool, default False) – Return the full output composed of a tuple of GeoSeries, by default False.

Returns:

GeoSeries with the polygons or a tuple of four GeoSeries as (polygons, cuts, dangles, invalid)

Return type:

GeoSeries | tuple(GeoSeries, GeoSeries, GeoSeries, GeoSeries)

Examples

>>> from shapely.geometry import LineString
>>> s = geopandas.GeoSeries([
...     LineString([(0, 0), (1, 1)]),
...     LineString([(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)]),
...     LineString([(0.5, 0.2), (0.5, 0.8)]),
... ])
>>> s.polygonize()
0        POLYGON ((0 0, 0.5 0.5, 1 1, 1 0, 0 0))
1    POLYGON ((0.5 0.5, 0 0, 0 1, 1 1, 0.5 0.5))
Name: polygons, dtype: geometry
>>> polygons, cuts, dangles, invalid = s.polygonize(full=True)
pop(item)

Return item and drop from frame. Raise KeyError if not found.

Parameters:

item (label) – Label of column to be popped.

Return type:

Series

Examples

>>> df = pd.DataFrame([('falcon', 'bird', 389.0),
...                    ('parrot', 'bird', 24.0),
...                    ('lion', 'mammal', 80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=('name', 'class', 'max_speed'))
>>> df
     name   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal        NaN
>>> df.pop('class')
0      bird
1      bird
2    mammal
3    mammal
Name: class, dtype: object
>>> df
     name  max_speed
0  falcon      389.0
1  parrot       24.0
2    lion       80.5
3  monkey        NaN
pow(other, axis='columns', level=None, fill_value=None)

Get Exponential power of dataframe and other, element-wise (binary operator pow).

Equivalent to dataframe ** other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
prod(axis=0, skipna=True, numeric_only=False, min_count=0, **kwargs)

Return the product of the values over the requested axis.

Parameters:
  • axis ({index (0), columns (1)}) –

    Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

    Warning

    The behavior of DataFrame.prod with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

    Added in version 2.0.0.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

  • min_count (int, default 0) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

  • **kwargs – Additional keyword arguments to be passed to the function.

Return type:

Series or scalar

See also

Series.sum

Return the sum.

Series.min

Return the minimum.

Series.max

Return the maximum.

Series.idxmin

Return the index of the minimum.

Series.idxmax

Return the index of the maximum.

DataFrame.sum

Return the sum over the requested axis.

DataFrame.min

Return the minimum over the requested axis.

DataFrame.max

Return the maximum over the requested axis.

DataFrame.idxmin

Return the index of the minimum over the requested axis.

DataFrame.idxmax

Return the index of the maximum over the requested axis.

Examples

By default, the product of an empty or all-NA Series is 1

>>> pd.Series([], dtype="float64").prod()
1.0

This can be controlled with the min_count parameter

>>> pd.Series([], dtype="float64").prod(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).prod()
1.0
>>> pd.Series([np.nan]).prod(min_count=1)
nan
product(axis=0, skipna=True, numeric_only=False, min_count=0, **kwargs)

Return the product of the values over the requested axis.

Parameters:
  • axis ({index (0), columns (1)}) –

    Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

    Warning

    The behavior of DataFrame.prod with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

    Added in version 2.0.0.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

  • min_count (int, default 0) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

  • **kwargs – Additional keyword arguments to be passed to the function.

Return type:

Series or scalar

See also

Series.sum

Return the sum.

Series.min

Return the minimum.

Series.max

Return the maximum.

Series.idxmin

Return the index of the minimum.

Series.idxmax

Return the index of the maximum.

DataFrame.sum

Return the sum over the requested axis.

DataFrame.min

Return the minimum over the requested axis.

DataFrame.max

Return the maximum over the requested axis.

DataFrame.idxmin

Return the index of the minimum over the requested axis.

DataFrame.idxmax

Return the index of the maximum over the requested axis.

Examples

By default, the product of an empty or all-NA Series is 1

>>> pd.Series([], dtype="float64").prod()
1.0

This can be controlled with the min_count parameter

>>> pd.Series([], dtype="float64").prod(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).prod()
1.0
>>> pd.Series([np.nan]).prod(min_count=1)
nan
project(other, normalized=False, align=None)

Return the distance along each geometry nearest to other.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg

The project method is the inverse of interpolate.

In shapely, this is equal to line_locate_point.

Parameters:
  • other (BaseGeometry or GeoSeries) – The other geometry to computed projected point from.

  • normalized (boolean) – If normalized is True, return the distance normalized to the length of the object.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series

Examples

>>> from shapely.geometry import LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         LineString([(0, 0), (2, 0), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Point(1, 0),
...         Point(1, 0),
...         Point(2, 1),
...     ],
...     index=range(1, 4),
... )
>>> s
0    LINESTRING (0 0, 2 0, 0 2)
1         LINESTRING (0 0, 2 2)
2         LINESTRING (2 0, 0 2)
dtype: geometry
>>> s2
1    POINT (1 0)
2    POINT (1 0)
3    POINT (2 1)
dtype: geometry

We can project each geometry on a single shapely geometry:

../../../_static/binary_op-03.svg
>>> s.project(Point(1, 0))
0    1.000000
1    0.707107
2    0.707107
dtype: float64

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and project elements with the same index using align=True or ignore index and project elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.project(s2, align=True)
0         NaN
1    0.707107
2    0.707107
3         NaN
dtype: float64
>>> s.project(s2, align=False)
0    1.000000
1    0.707107
2    0.707107
dtype: float64

See also

GeoSeries.interpolate

quantile(q=0.5, axis=0, numeric_only=False, interpolation='linear', method='single')

Return values at the given quantile over requested axis.

Parameters:
  • q (float or array-like, default 0.5 (50% quantile)) – Value between 0 <= q <= 1, the quantile(s) to compute.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Equals 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

  • numeric_only (bool, default False) –

    Include only float, int or boolean data.

    Changed in version 2.0.0: The default value of numeric_only is now False.

  • interpolation ({'linear', 'lower', 'higher', 'midpoint', 'nearest'}) –

    This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:

    • linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

    • lower: i.

    • higher: j.

    • nearest: i or j whichever is nearest.

    • midpoint: (i + j) / 2.

  • method ({'single', 'table'}, default 'single') – Whether to compute quantiles per-column (‘single’) or over all columns (‘table’). When ‘table’, the only allowed interpolation methods are ‘nearest’, ‘lower’, and ‘higher’.

Returns:

If q is an array, a DataFrame will be returned where the

index is q, the columns are the columns of self, and the values are the quantiles.

If q is a float, a Series will be returned where the

index is the columns of self and the values are the quantiles.

Return type:

Series or DataFrame

See also

core.window.rolling.Rolling.quantile

Rolling quantile.

numpy.percentile

Numpy function to compute the percentile.

Examples

>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
...                   columns=['a', 'b'])
>>> df.quantile(.1)
a    1.3
b    3.7
Name: 0.1, dtype: float64
>>> df.quantile([.1, .5])
       a     b
0.1  1.3   3.7
0.5  2.5  55.0

Specifying method=’table’ will compute the quantile over all columns.

>>> df.quantile(.1, method="table", interpolation="nearest")
a    1
b    1
Name: 0.1, dtype: int64
>>> df.quantile([.1, .5], method="table", interpolation="nearest")
     a    b
0.1  1    1
0.5  3  100

Specifying numeric_only=False will also compute the quantile of datetime and timedelta data.

>>> df = pd.DataFrame({'A': [1, 2],
...                    'B': [pd.Timestamp('2010'),
...                          pd.Timestamp('2011')],
...                    'C': [pd.Timedelta('1 days'),
...                          pd.Timedelta('2 days')]})
>>> df.quantile(0.5, numeric_only=False)
A                    1.5
B    2010-07-02 12:00:00
C        1 days 12:00:00
Name: 0.5, dtype: object
query(expr, *, inplace=False, **kwargs)

Query the columns of a DataFrame with a boolean expression.

Parameters:
  • expr (str) –

    The query string to evaluate.

    You can refer to variables in the environment by prefixing them with an ‘@’ character like @a + b.

    You can refer to column names that are not valid Python variable names by surrounding them in backticks. Thus, column names containing spaces or punctuations (besides underscores) or starting with digits must be surrounded by backticks. (For example, a column named “Area (cm^2)” would be referenced as `Area (cm^2)`). Column names which are Python keywords (like “list”, “for”, “import”, etc) cannot be used.

    For example, if one of your columns is called a a and you want to sum it with b, your query should be `a a` + b.

  • inplace (bool) – Whether to modify the DataFrame rather than creating a new one.

  • **kwargs – See the documentation for eval() for complete details on the keyword arguments accepted by DataFrame.query().

Returns:

DataFrame resulting from the provided query expression or None if inplace=True.

Return type:

DataFrame or None

See also

eval

Evaluate a string describing operations on DataFrame columns.

DataFrame.eval

Evaluate a string describing operations on DataFrame columns.

Notes

The result of the evaluation of this expression is first passed to DataFrame.loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to DataFrame.__getitem__().

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The DataFrame.index and DataFrame.columns attributes of the DataFrame instance are placed in the query namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for the frame index; you can also use the name of the index to identify it in a query. Please note that Python keywords may not be used as identifiers.

For further details and examples see the query documentation in indexing.

Backtick quoted variables

Backtick quoted variables are parsed as literal Python code and are converted internally to a Python valid identifier. This can lead to the following problems.

During parsing a number of disallowed characters inside the backtick quoted string are replaced by strings that are allowed as a Python identifier. These characters include all operators in Python, the space character, the question mark, the exclamation mark, the dollar sign, and the euro sign. For other characters that fall outside the ASCII range (U+0001..U+007F) and those that are not further specified in PEP 3131, the query parser will raise an error. This excludes whitespace different than the space character, but also the hashtag (as it is used for comments) and the backtick itself (backtick can also not be escaped).

In a special case, quotes that make a pair around a backtick can confuse the parser. For example, `it's` > `that's` will raise an error, as it forms a quoted string ('s > `that') with a backtick inside.

See also the Python documentation about lexical analysis (https://docs.python.org/3/reference/lexical_analysis.html) in combination with the source code in pandas.core.computation.parsing.

Examples

>>> df = pd.DataFrame({'A': range(1, 6),
...                    'B': range(10, 0, -2),
...                    'C C': range(10, 5, -1)})
>>> df
   A   B  C C
0  1  10   10
1  2   8    9
2  3   6    8
3  4   4    7
4  5   2    6
>>> df.query('A > B')
   A  B  C C
4  5  2    6

The previous expression is equivalent to

>>> df[df.A > df.B]
   A  B  C C
4  5  2    6

For columns with spaces in their name, you can use backtick quoting.

>>> df.query('B == `C C`')
   A   B  C C
0  1  10   10

The previous expression is equivalent to

>>> df[df.B == df['C C']]
   A   B  C C
0  1  10   10
radd(other, axis='columns', level=None, fill_value=None)

Get Addition of dataframe and other, element-wise (binary operator radd).

Equivalent to other + dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, add.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rank(axis=0, method='average', numeric_only=False, na_option='keep', ascending=True, pct=False)

Compute numerical data ranks (1 through n) along axis.

By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Index to direct ranking. For Series this parameter is unused and defaults to 0.

  • method ({'average', 'min', 'max', 'first', 'dense'}, default 'average') –

    How to rank the group of records that have the same value (i.e. ties):

    • average: average rank of the group

    • min: lowest rank in the group

    • max: highest rank in the group

    • first: ranks assigned in order they appear in the array

    • dense: like ‘min’, but rank always increases by 1 between groups.

  • numeric_only (bool, default False) –

    For DataFrame objects, rank only numeric columns if set to True.

    Changed in version 2.0.0: The default value of numeric_only is now False.

  • na_option ({'keep', 'top', 'bottom'}, default 'keep') –

    How to rank NaN values:

    • keep: assign NaN rank to NaN values

    • top: assign lowest rank to NaN values

    • bottom: assign highest rank to NaN values

  • ascending (bool, default True) – Whether or not the elements should be ranked in ascending order.

  • pct (bool, default False) – Whether or not to display the returned rankings in percentile form.

Returns:

Return a Series or DataFrame with data ranks as values.

Return type:

same type as caller

See also

core.groupby.DataFrameGroupBy.rank

Rank of values within each group.

core.groupby.SeriesGroupBy.rank

Rank of values within each group.

Examples

>>> df = pd.DataFrame(data={'Animal': ['cat', 'penguin', 'dog',
...                                    'spider', 'snake'],
...                         'Number_legs': [4, 2, 4, 8, np.nan]})
>>> df
    Animal  Number_legs
0      cat          4.0
1  penguin          2.0
2      dog          4.0
3   spider          8.0
4    snake          NaN

Ties are assigned the mean of the ranks (by default) for the group.

>>> s = pd.Series(range(5), index=list("abcde"))
>>> s["d"] = s["b"]
>>> s.rank()
a    1.0
b    2.5
c    4.0
d    2.5
e    5.0
dtype: float64

The following example shows how the method behaves with the above parameters:

  • default_rank: this is the default behaviour obtained without using any parameter.

  • max_rank: setting method = 'max' the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)

  • NA_bottom: choosing na_option = 'bottom', if there are records with NaN values they are placed at the bottom of the ranking.

  • pct_rank: when setting pct = True, the ranking is expressed as percentile rank.

>>> df['default_rank'] = df['Number_legs'].rank()
>>> df['max_rank'] = df['Number_legs'].rank(method='max')
>>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
>>> df['pct_rank'] = df['Number_legs'].rank(pct=True)
>>> df
    Animal  Number_legs  default_rank  max_rank  NA_bottom  pct_rank
0      cat          4.0           2.5       3.0        2.5     0.625
1  penguin          2.0           1.0       1.0        1.0     0.250
2      dog          4.0           2.5       3.0        2.5     0.625
3   spider          8.0           4.0       4.0        4.0     1.000
4    snake          NaN           NaN       NaN        5.0       NaN
rdiv(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
reindex(labels=None, *, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=nan, limit=None, tolerance=None)

Conform DataFrame to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:
  • labels (array-like, optional) – New labels / index to conform the axis specified by ‘axis’ to.

  • index (array-like, optional) – New labels for the index. Preferably an Index object to avoid duplicating data.

  • columns (array-like, optional) – New labels for the columns. Preferably an Index object to avoid duplicating data.

  • axis (int or str, optional) – Axis to target. Can be either the axis name (‘index’, ‘columns’) or number (0, 1).

  • method ({None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}) –

    Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

    • None (default): don’t fill gaps

    • pad / ffill: Propagate last valid observation forward to next valid.

    • backfill / bfill: Use next valid observation to fill gap.

    • nearest: Use nearest valid observations to fill gap.

  • copy (bool, default True) –

    Return a new object, even if the passed indexes are the same.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

  • level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (scalar, default np.nan) – Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

  • limit (int, default None) – Maximum number of consecutive elements to forward or backward fill.

  • tolerance (optional) –

    Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.

    Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Return type:

DataFrame with changed index.

See also

DataFrame.set_index

Set row labels.

DataFrame.reset_index

Remove row labels or move them to new columns.

DataFrame.reindex_like

Change to same indices as other DataFrame.

Examples

DataFrame.reindex supports two calling conventions

  • (index=index_labels, columns=column_labels, ...)

  • (labels, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Create a dataframe with some fictional data.

>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],
...                   'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
...                   index=index)
>>> df
           http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00

Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the dataframe are assigned NaN.

>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
...              'Chrome']
>>> df.reindex(new_index)
               http_status  response_time
Safari               404.0           0.07
Iceweasel              NaN            NaN
Comodo Dragon          NaN            NaN
IE10                 404.0           0.08
Chrome               200.0           0.02

We can fill in the missing values by passing a value to the keyword fill_value. Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.

>>> df.reindex(new_index, fill_value=0)
               http_status  response_time
Safari                 404           0.07
Iceweasel                0           0.00
Comodo Dragon            0           0.00
IE10                   404           0.08
Chrome                 200           0.02
>>> df.reindex(new_index, fill_value='missing')
              http_status response_time
Safari                404          0.07
Iceweasel         missing       missing
Comodo Dragon     missing       missing
IE10                  404          0.08
Chrome                200          0.02

We can also reindex the columns.

>>> df.reindex(columns=['http_status', 'user_agent'])
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

Or we can use “axis-style” keyword arguments

>>> df.reindex(['http_status', 'user_agent'], axis="columns")
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

To further illustrate the filling functionality in reindex, we will create a dataframe with a monotonically increasing index (for example, a sequence of dates).

>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
...                    index=date_index)
>>> df2
            prices
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0

Suppose we decide to expand the dataframe to cover a wider date range.

>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
            prices
2009-12-29     NaN
2009-12-30     NaN
2009-12-31     NaN
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

The index entries that did not have a value in the original data frame (for example, ‘2009-12-29’) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.

For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.

>>> df2.reindex(date_index2, method='bfill')
            prices
2009-12-29   100.0
2009-12-30   100.0
2009-12-31   100.0
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

Please note that the NaN value present in the original dataframe (at index value 2010-01-03) will not be filled by any of the value propagation schemes. This is because filling while reindexing does not look at dataframe values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataframe, use the fillna() method.

See the user guide for more.

reindex_like(other, method=None, copy=None, limit=None, tolerance=None)

Return an object with matching indices as other object.

Conform the object to the same index on all axes. Optional filling logic, placing NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:
  • other (Object of the same data type) – Its row and column indices are used to define the new indices of this object.

  • method ({None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}) –

    Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

    • None (default): don’t fill gaps

    • pad / ffill: propagate last valid observation forward to next valid

    • backfill / bfill: use next valid observation to fill gap

    • nearest: use nearest valid observations to fill gap.

  • copy (bool, default True) –

    Return a new object, even if the passed indexes are the same.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

  • limit (int, default None) – Maximum number of consecutive labels to fill for inexact matches.

  • tolerance (optional) –

    Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation abs(index[indexer] - target) <= tolerance.

    Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Returns:

Same type as caller, but with changed indices on each axis.

Return type:

Series or DataFrame

See also

DataFrame.set_index

Set row labels.

DataFrame.reset_index

Remove row labels or move them to new columns.

DataFrame.reindex

Change to new indices or expand indices.

Notes

Same as calling .reindex(index=other.index, columns=other.columns,...).

Examples

>>> df1 = pd.DataFrame([[24.3, 75.7, 'high'],
...                     [31, 87.8, 'high'],
...                     [22, 71.6, 'medium'],
...                     [35, 95, 'medium']],
...                    columns=['temp_celsius', 'temp_fahrenheit',
...                             'windspeed'],
...                    index=pd.date_range(start='2014-02-12',
...                                        end='2014-02-15', freq='D'))
>>> df1
            temp_celsius  temp_fahrenheit windspeed
2014-02-12          24.3             75.7      high
2014-02-13          31.0             87.8      high
2014-02-14          22.0             71.6    medium
2014-02-15          35.0             95.0    medium
>>> df2 = pd.DataFrame([[28, 'low'],
...                     [30, 'low'],
...                     [35.1, 'medium']],
...                    columns=['temp_celsius', 'windspeed'],
...                    index=pd.DatetimeIndex(['2014-02-12', '2014-02-13',
...                                            '2014-02-15']))
>>> df2
            temp_celsius windspeed
2014-02-12          28.0       low
2014-02-13          30.0       low
2014-02-15          35.1    medium
>>> df2.reindex_like(df1)
            temp_celsius  temp_fahrenheit windspeed
2014-02-12          28.0              NaN       low
2014-02-13          30.0              NaN       low
2014-02-14           NaN              NaN       NaN
2014-02-15          35.1              NaN    medium
relate(other, align=None)

Return the DE-9IM intersection matrices for the geometries.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (BaseGeometry or GeoSeries) – The other geometry to computed the DE-9IM intersection matrices from.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Returns:

spatial_relations – The DE-9IM intersection matrices which describe the spatial relations of the other geometry.

Return type:

Series of strings

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 6),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

We can relate each geometry and a single shapely geometry:

../../../_static/binary_op-03.svg
>>> s.relate(Polygon([(0, 0), (1, 1), (0, 1)]))
0    212F11FF2
1    212F11FF2
2    F11F00212
3    F01FF0212
4    F0FFFF212
dtype: object

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.relate(s2, align=True)
0         None
1    212F11FF2
2    0F1FF0102
3    1FFF0FFF2
4    FF0FFF0F2
5         None
dtype: object
>>> s.relate(s2, align=False)
0    212F11FF2
1    1F20F1102
2    0F1FF0102
3    0F1FF0FF2
4    0FFFFFFF2
dtype: object
relate_pattern(other, pattern, align=None)

Return True if the DE-9IM string code for the relationship between the geometries satisfies the pattern, else False.

This function compares the DE-9IM code string for two geometries against a specified pattern. If the string matches the pattern then True is returned, otherwise False. The pattern specified can be an exact match (0, 1 or 2), a boolean match (uppercase T or F), or a wildcard (*). For example, the pattern for the within predicate is 'T*F**F***'

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (BaseGeometry or GeoSeries) – The other geometry to be tested agains the pattern.

  • pattern (str) – The DE-9IM pattern to test against.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 6),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

We can check the relate pattern of each geometry and a single shapely geometry:

../../../_static/binary_op-03.svg
>>> s.relate_pattern(Polygon([(0, 0), (1, 1), (0, 1)]), "2*T***F**")
0     True
1     True
2    False
3    False
4    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.relate_pattern(s2, "TF******T", align=True)
0    False
1    False
2     True
3     True
4    False
5    False
dtype: bool
>>> s.relate_pattern(s2, "TF******T", align=False)
0    False
1     True
2     True
3     True
4     True
dtype: bool
remove_repeated_points(tolerance=0.0)

Return a GeoSeries containing a copy of the input geometry with repeated points removed.

From the start of the coordinate sequence, each next point within the tolerance is removed.

Removing repeated points with a non-zero tolerance may result in an invalid geometry being returned.

Parameters:

tolerance (float, default 0.0) – Remove all points within this distance of each other. Use 0.0 to remove only exactly repeated points (the default).

Examples

>>> from shapely import LineString, Polygon
>>> s = geopandas.GeoSeries(
...     [
...        LineString([(0, 0), (0, 0), (1, 0)]),
...        Polygon([(0, 0), (0, 0.5), (0, 1), (0.5, 1), (0,0)]),
...     ],
... )
>>> s
0                 LINESTRING (0 0, 0 0, 1 0)
1    POLYGON ((0 0, 0 0.5, 0 1, 0.5 1, 0 0))
dtype: geometry
>>> s.remove_repeated_points(tolerance=0.0)
0                      LINESTRING (0 0, 1 0)
1    POLYGON ((0 0, 0 0.5, 0 1, 0.5 1, 0 0))
dtype: geometry
rename(mapper=None, *, index=None, columns=None, axis=None, copy=None, inplace=False, level=None, errors='ignore')

Rename columns or index labels.

Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.

See the user guide for more.

Parameters:
  • mapper (dict-like or function) – Dict-like or function transformations to apply to that axis’ values. Use either mapper and axis to specify the axis to target with mapper, or index and columns.

  • index (dict-like or function) – Alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper).

  • columns (dict-like or function) – Alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper).

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Axis to target with mapper. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). The default is ‘index’.

  • copy (bool, default True) –

    Also copy underlying data.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

  • inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one. If True then value of copy is ignored.

  • level (int or level name, default None) – In case of a MultiIndex, only rename labels in the specified level.

  • errors ({'ignore', 'raise'}, default 'ignore') – If ‘raise’, raise a KeyError when a dict-like mapper, index, or columns contains labels that are not present in the Index being transformed. If ‘ignore’, existing keys will be renamed and extra keys will be ignored.

Returns:

DataFrame with the renamed axis labels or None if inplace=True.

Return type:

DataFrame or None

Raises:

KeyError – If any of the labels is not found in the selected axis and “errors=’raise’”.

See also

DataFrame.rename_axis

Set the name of the axis.

Examples

DataFrame.rename supports two calling conventions

  • (index=index_mapper, columns=columns_mapper, ...)

  • (mapper, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Rename columns using a mapping:

>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df.rename(columns={"A": "a", "B": "c"})
   a  c
0  1  4
1  2  5
2  3  6

Rename index using a mapping:

>>> df.rename(index={0: "x", 1: "y", 2: "z"})
   A  B
x  1  4
y  2  5
z  3  6

Cast index labels to a different type:

>>> df.index
RangeIndex(start=0, stop=3, step=1)
>>> df.rename(index=str).index
Index(['0', '1', '2'], dtype='object')
>>> df.rename(columns={"A": "a", "B": "b", "C": "c"}, errors="raise")
Traceback (most recent call last):
KeyError: ['C'] not found in axis

Using axis-style parameters:

>>> df.rename(str.lower, axis='columns')
   a  b
0  1  4
1  2  5
2  3  6
>>> df.rename({1: 2, 2: 4}, axis='index')
   A  B
0  1  4
2  2  5
4  3  6
rename_axis(mapper=<no_default>, *, index=<no_default>, columns=<no_default>, axis=0, copy=None, inplace=False)

Set the name of the axis for the index or columns.

Parameters:
  • mapper (scalar, list-like, optional) – Value to set the axis name attribute.

  • index (scalar, list-like, dict-like or function, optional) –

    A scalar, list-like, dict-like or functions transformations to apply to that axis’ values. Note that the columns parameter is not allowed if the object is a Series. This parameter only apply for DataFrame type objects.

    Use either mapper and axis to specify the axis to target with mapper, or index and/or columns.

  • columns (scalar, list-like, dict-like or function, optional) –

    A scalar, list-like, dict-like or functions transformations to apply to that axis’ values. Note that the columns parameter is not allowed if the object is a Series. This parameter only apply for DataFrame type objects.

    Use either mapper and axis to specify the axis to target with mapper, or index and/or columns.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to rename. For Series this parameter is unused and defaults to 0.

  • copy (bool, default None) –

    Also copy underlying data.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

  • inplace (bool, default False) – Modifies the object directly, instead of creating a new Series or DataFrame.

Returns:

The same type as the caller or None if inplace=True.

Return type:

Series, DataFrame, or None

See also

Series.rename

Alter Series index labels or name.

DataFrame.rename

Alter DataFrame index labels or name.

Index.rename

Set new names on index.

Notes

DataFrame.rename_axis supports two calling conventions

  • (index=index_mapper, columns=columns_mapper, ...)

  • (mapper, axis={'index', 'columns'}, ...)

The first calling convention will only modify the names of the index and/or the names of the Index object that is the columns. In this case, the parameter copy is ignored.

The second calling convention will modify the names of the corresponding index if mapper is a list or a scalar. However, if mapper is dict-like or a function, it will use the deprecated behavior of modifying the axis labels.

We highly recommend using keyword arguments to clarify your intent.

Examples

Series

>>> s = pd.Series(["dog", "cat", "monkey"])
>>> s
0       dog
1       cat
2    monkey
dtype: object
>>> s.rename_axis("animal")
animal
0    dog
1    cat
2    monkey
dtype: object

DataFrame

>>> df = pd.DataFrame({"num_legs": [4, 4, 2],
...                    "num_arms": [0, 0, 2]},
...                   ["dog", "cat", "monkey"])
>>> df
        num_legs  num_arms
dog            4         0
cat            4         0
monkey         2         2
>>> df = df.rename_axis("animal")
>>> df
        num_legs  num_arms
animal
dog            4         0
cat            4         0
monkey         2         2
>>> df = df.rename_axis("limbs", axis="columns")
>>> df
limbs   num_legs  num_arms
animal
dog            4         0
cat            4         0
monkey         2         2

MultiIndex

>>> df.index = pd.MultiIndex.from_product([['mammal'],
...                                        ['dog', 'cat', 'monkey']],
...                                       names=['type', 'name'])
>>> df
limbs          num_legs  num_arms
type   name
mammal dog            4         0
       cat            4         0
       monkey         2         2
>>> df.rename_axis(index={'type': 'class'})
limbs          num_legs  num_arms
class  name
mammal dog            4         0
       cat            4         0
       monkey         2         2
>>> df.rename_axis(columns=str.upper)
LIMBS          num_legs  num_arms
type   name
mammal dog            4         0
       cat            4         0
       monkey         2         2
rename_geometry(col, inplace=False)[source]

Rename the GeoDataFrame geometry column to the specified name.

By default yields a new object.

The original geometry column is replaced with the input.

Parameters:
  • col (new geometry column label)

  • inplace (boolean, default False) – Modify the GeoDataFrame in place (do not create a new object)

Return type:

GeoDataFrame | None

Examples

>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> df = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> df1 = df.rename_geometry('geom1')
>>> df1.geometry.name
'geom1'
>>> df.rename_geometry('geom1', inplace=True)
>>> df.geometry.name
'geom1'

See also

GeoDataFrame.set_geometry

set the active geometry

reorder_levels(order, axis=0)

Rearrange index levels using input order. May not drop or duplicate levels.

Parameters:
  • order (list of int or list of str) – List representing new level order. Reference level by number (position) or by key (label).

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Where to reorder levels.

Return type:

DataFrame

Examples

>>> data = {
...     "class": ["Mammals", "Mammals", "Reptiles"],
...     "diet": ["Omnivore", "Carnivore", "Carnivore"],
...     "species": ["Humans", "Dogs", "Snakes"],
... }
>>> df = pd.DataFrame(data, columns=["class", "diet", "species"])
>>> df = df.set_index(["class", "diet"])
>>> df
                                  species
class      diet
Mammals    Omnivore                Humans
           Carnivore                 Dogs
Reptiles   Carnivore               Snakes

Let’s reorder the levels of the index:

>>> df.reorder_levels(["diet", "class"])
                                  species
diet      class
Omnivore  Mammals                  Humans
Carnivore Mammals                    Dogs
          Reptiles                 Snakes
replace(to_replace=None, value=<no_default>, *, inplace=False, limit=None, regex=False, method=<no_default>)

Replace values given in to_replace with value.

Values of the Series/DataFrame are replaced with other values dynamically. This differs from updating with .loc or .iloc, which require you to specify a location to update with some value.

Parameters:
  • to_replace (str, regex, list, dict, Series, int, float, or None) –

    How to find the values that will be replaced.

    • numeric, str or regex:

      • numeric: numeric values equal to to_replace will be replaced with value

      • str: string exactly matching to_replace will be replaced with value

      • regex: regexs matching to_replace will be replaced with value

    • list of str, regex, or numeric:

      • First, if to_replace and value are both lists, they must be the same length.

      • Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.

      • str, regex and numeric rules apply as above.

    • dict:

      • Dicts can be used to specify different replacement values for different existing values. For example, {'a': 'b', 'y': 'z'} replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way, the optional value parameter should not be given.

      • For a DataFrame a dict can specify that different values should be replaced in different columns. For example, {'a': 1, 'b': 'z'} looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value. The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.

      • For a DataFrame nested dictionaries, e.g., {'a': {'b': np.nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with NaN. The optional value parameter should not be specified to use a nested dict in this way. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.

    • None:

      • This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.

    See the examples section for examples of each of these.

  • value (scalar, dict, list, str, regex, default None) – Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.

  • inplace (bool, default False) – If True, performs operation inplace and returns None.

  • limit (int, default None) –

    Maximum size gap to forward or backward fill.

    Deprecated since version 2.1.0.

  • regex (bool or same types as `to_replace`, default False) – Whether to interpret to_replace and/or value as regular expressions. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None.

  • method ({'pad', 'ffill', 'bfill'}) –

    The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None.

    Deprecated since version 2.1.0.

Returns:

Object after replacement.

Return type:

Series/DataFrame

Raises:
  • AssertionError

    • If regex is not a bool and to_replace is not None.

  • TypeError

    • If to_replace is not a scalar, array-like, dict, or None * If to_replace is a dict and value is not a list, dict, ndarray, or Series * If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series. * When replacing multiple bool or datetime64 objects and the arguments to to_replace does not match the type of the value being replaced

  • ValueError

    • If a list or an ndarray is passed to to_replace and value but they are not the same length.

See also

Series.fillna

Fill NA values.

DataFrame.fillna

Fill NA values.

Series.where

Replace values based on boolean condition.

DataFrame.where

Replace values based on boolean condition.

DataFrame.map

Apply a function to a Dataframe elementwise.

Series.map

Map values of Series according to an input mapping or function.

Series.str.replace

Simple string replacement.

Notes

  • Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.

  • Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.

  • This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.

  • When dict is used as the to_replace value, it is like key(s) in the dict are the to_replace part and value(s) in the dict are the value parameter.

Examples

Scalar `to_replace` and `value`

>>> s = pd.Series([1, 2, 3, 4, 5])
>>> s.replace(1, 5)
0    5
1    2
2    3
3    4
4    5
dtype: int64
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
...                    'B': [5, 6, 7, 8, 9],
...                    'C': ['a', 'b', 'c', 'd', 'e']})
>>> df.replace(0, 5)
    A  B  C
0  5  5  a
1  1  6  b
2  2  7  c
3  3  8  d
4  4  9  e

List-like `to_replace`

>>> df.replace([0, 1, 2, 3], 4)
    A  B  C
0  4  5  a
1  4  6  b
2  4  7  c
3  4  8  d
4  4  9  e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
    A  B  C
0  4  5  a
1  3  6  b
2  2  7  c
3  1  8  d
4  4  9  e
>>> s.replace([1, 2], method='bfill')
0    3
1    3
2    3
3    4
4    5
dtype: int64

dict-like `to_replace`

>>> df.replace({0: 10, 1: 100})
        A  B  C
0   10  5  a
1  100  6  b
2    2  7  c
3    3  8  d
4    4  9  e
>>> df.replace({'A': 0, 'B': 5}, 100)
        A    B  C
0  100  100  a
1    1    6  b
2    2    7  c
3    3    8  d
4    4    9  e
>>> df.replace({'A': {0: 100, 4: 400}})
        A  B  C
0  100  5  a
1    1  6  b
2    2  7  c
3    3  8  d
4  400  9  e

Regular expression `to_replace`

>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
...                    'B': ['abc', 'bar', 'xyz']})
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
        A    B
0   new  abc
1   foo  new
2  bait  xyz
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
        A    B
0   new  abc
1   foo  bar
2  bait  xyz
>>> df.replace(regex=r'^ba.$', value='new')
        A    B
0   new  abc
1   foo  new
2  bait  xyz
>>> df.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'})
        A    B
0   new  abc
1   xyz  new
2  bait  xyz
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
        A    B
0   new  abc
1   new  new
2  bait  xyz

Compare the behavior of s.replace({'a': None}) and s.replace('a', None) to understand the peculiarities of the to_replace parameter:

>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])

When one uses a dict as the to_replace value, it is like the value(s) in the dict are equal to the value parameter. s.replace({'a': None}) is equivalent to s.replace(to_replace={'a': None}, value=None, method=None):

>>> s.replace({'a': None})
0      10
1    None
2    None
3       b
4    None
dtype: object

When value is not explicitly passed and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case.

>>> s.replace('a')
0    10
1    10
2    10
3     b
4     b
dtype: object

Deprecated since version 2.1.0: The ‘method’ parameter and padding behavior are deprecated.

On the other hand, if None is explicitly passed for value, it will be respected:

>>> s.replace('a', None)
0      10
1    None
2    None
3       b
4    None
dtype: object

Changed in version 1.4.0: Previously the explicit None was silently ignored.

When regex=True, value is not None and to_replace is a string, the replacement will be applied in all columns of the DataFrame.

>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
...                    'B': ['a', 'b', 'c', 'd', 'e'],
...                    'C': ['f', 'g', 'h', 'i', 'j']})
>>> df.replace(to_replace='^[a-g]', value='e', regex=True)
    A  B  C
0  0  e  e
1  1  e  e
2  2  e  h
3  3  e  i
4  4  e  j

If value is not None and to_replace is a dictionary, the dictionary keys will be the DataFrame columns that the replacement will be applied.

>>> df.replace(to_replace={'B': '^[a-c]', 'C': '^[h-j]'}, value='e', regex=True)
    A  B  C
0  0  e  f
1  1  e  g
2  2  e  e
3  3  d  e
4  4  e  e
representative_point()

Return a GeoSeries of (cheaply computed) points that are guaranteed to be within each geometry.

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
>>> s.representative_point()
0    POINT (0.25 0.5)
1         POINT (1 1)
2         POINT (0 0)
dtype: geometry

See also

GeoSeries.centroid

geometric centroid

resample(rule, axis=<no_default>, closed=None, label=None, convention=<no_default>, kind=<no_default>, on=None, level=None, origin='start_day', offset=None, group_keys=False)

Resample time-series data.

Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or the caller must pass the label of a datetime-like series/index to the on/level keyword parameter.

Parameters:
  • rule (DateOffset, Timedelta or str) – The offset string or object representing target conversion.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    Which axis to use for up- or down-sampling. For Series this parameter is unused and defaults to 0. Must be DatetimeIndex, TimedeltaIndex or PeriodIndex.

    Deprecated since version 2.0.0: Use frame.T.resample(…) instead.

  • closed ({'right', 'left'}, default None) – Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for ‘ME’, ‘YE’, ‘QE’, ‘BME’, ‘BA’, ‘BQE’, and ‘W’ which all have a default of ‘right’.

  • label ({'right', 'left'}, default None) – Which bin edge label to label bucket with. The default is ‘left’ for all frequency offsets except for ‘ME’, ‘YE’, ‘QE’, ‘BME’, ‘BA’, ‘BQE’, and ‘W’ which all have a default of ‘right’.

  • convention ({'start', 'end', 's', 'e'}, default 'start') –

    For PeriodIndex only, controls whether to use the start or end of rule.

    Deprecated since version 2.2.0: Convert PeriodIndex to DatetimeIndex before resampling instead.

  • kind ({'timestamp', 'period'}, optional, default None) –

    Pass ‘timestamp’ to convert the resulting index to a DateTimeIndex or ‘period’ to convert it to a PeriodIndex. By default the input representation is retained.

    Deprecated since version 2.2.0: Convert index to desired type explicitly instead.

  • on (str, optional) – For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.

  • level (str or int, optional) – For a MultiIndex, level (name or number) to use for resampling. level must be datetime-like.

  • origin (Timestamp or str, default 'start_day') –

    The timestamp on which to adjust the grouping. The timezone of origin must match the timezone of the index. If string, must be one of the following:

    • ’epoch’: origin is 1970-01-01

    • ’start’: origin is the first value of the timeseries

    • ’start_day’: origin is the first day at midnight of the timeseries

    • ’end’: origin is the last value of the timeseries

    • ’end_day’: origin is the ceiling midnight of the last day

    Added in version 1.3.0.

    Note

    Only takes effect for Tick-frequencies (i.e. fixed frequencies like days, hours, and minutes, rather than months or quarters).

  • offset (Timedelta or str, default is None) – An offset timedelta added to the origin.

  • group_keys (bool, default False) –

    Whether to include the group keys in the result index when using .apply() on the resampled object.

    Added in version 1.5.0: Not specifying group_keys will retain values-dependent behavior from pandas 1.4 and earlier (see pandas 1.5.0 Release notes for examples).

    Changed in version 2.0.0: group_keys now defaults to False.

Returns:

Resampler object.

Return type:

pandas.api.typing.Resampler

See also

Series.resample

Resample a Series.

DataFrame.resample

Resample a DataFrame.

groupby

Group Series/DataFrame by mapping, function, label, or list of labels.

asfreq

Reindex a Series/DataFrame with the given frequency without grouping.

Notes

See the user guide for more.

To learn more about the offset strings, please see this link.

Examples

Start by creating a series with 9 one minute timestamps.

>>> index = pd.date_range('1/1/2000', periods=9, freq='min')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: min, dtype: int64

Downsample the series into 3 minute bins and sum the values of the timestamps falling into a bin.

>>> series.resample('3min').sum()
2000-01-01 00:00:00     3
2000-01-01 00:03:00    12
2000-01-01 00:06:00    21
Freq: 3min, dtype: int64

Downsample the series into 3 minute bins as above, but label each bin using the right edge instead of the left. Please note that the value in the bucket used as the label is not included in the bucket, which it labels. For example, in the original series the bucket 2000-01-01 00:03:00 contains the value 3, but the summed value in the resampled bucket with the label 2000-01-01 00:03:00 does not include 3 (if it did, the summed value would be 6, not 3).

>>> series.resample('3min', label='right').sum()
2000-01-01 00:03:00     3
2000-01-01 00:06:00    12
2000-01-01 00:09:00    21
Freq: 3min, dtype: int64

To include this value close the right side of the bin interval, as shown below.

>>> series.resample('3min', label='right', closed='right').sum()
2000-01-01 00:00:00     0
2000-01-01 00:03:00     6
2000-01-01 00:06:00    15
2000-01-01 00:09:00    15
Freq: 3min, dtype: int64

Upsample the series into 30 second bins.

>>> series.resample('30s').asfreq()[0:5]   # Select first 5 rows
2000-01-01 00:00:00   0.0
2000-01-01 00:00:30   NaN
2000-01-01 00:01:00   1.0
2000-01-01 00:01:30   NaN
2000-01-01 00:02:00   2.0
Freq: 30s, dtype: float64

Upsample the series into 30 second bins and fill the NaN values using the ffill method.

>>> series.resample('30s').ffill()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    0
2000-01-01 00:01:00    1
2000-01-01 00:01:30    1
2000-01-01 00:02:00    2
Freq: 30s, dtype: int64

Upsample the series into 30 second bins and fill the NaN values using the bfill method.

>>> series.resample('30s').bfill()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    1
2000-01-01 00:01:00    1
2000-01-01 00:01:30    2
2000-01-01 00:02:00    2
Freq: 30s, dtype: int64

Pass a custom function via apply

>>> def custom_resampler(arraylike):
...     return np.sum(arraylike) + 5
...
>>> series.resample('3min').apply(custom_resampler)
2000-01-01 00:00:00     8
2000-01-01 00:03:00    17
2000-01-01 00:06:00    26
Freq: 3min, dtype: int64

For DataFrame objects, the keyword on can be used to specify the column instead of the index for resampling.

>>> d = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
...      'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df = pd.DataFrame(d)
>>> df['week_starting'] = pd.date_range('01/01/2018',
...                                     periods=8,
...                                     freq='W')
>>> df
   price  volume week_starting
0     10      50    2018-01-07
1     11      60    2018-01-14
2      9      40    2018-01-21
3     13     100    2018-01-28
4     14      50    2018-02-04
5     18     100    2018-02-11
6     17      40    2018-02-18
7     19      50    2018-02-25
>>> df.resample('ME', on='week_starting').mean()
               price  volume
week_starting
2018-01-31     10.75    62.5
2018-02-28     17.00    60.0

For a DataFrame with MultiIndex, the keyword level can be used to specify on which level the resampling needs to take place.

>>> days = pd.date_range('1/1/2000', periods=4, freq='D')
>>> d2 = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
...       'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df2 = pd.DataFrame(
...     d2,
...     index=pd.MultiIndex.from_product(
...         [days, ['morning', 'afternoon']]
...     )
... )
>>> df2
                      price  volume
2000-01-01 morning       10      50
           afternoon     11      60
2000-01-02 morning        9      40
           afternoon     13     100
2000-01-03 morning       14      50
           afternoon     18     100
2000-01-04 morning       17      40
           afternoon     19      50
>>> df2.resample('D', level=0).sum()
            price  volume
2000-01-01     21     110
2000-01-02     22     140
2000-01-03     32     150
2000-01-04     36      90

If you want to adjust the start of the bins based on a fixed timestamp:

>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
>>> rng = pd.date_range(start, end, freq='7min')
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7min, dtype: int64
>>> ts.resample('17min').sum()
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17min, dtype: int64
>>> ts.resample('17min', origin='epoch').sum()
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17min, dtype: int64
>>> ts.resample('17min', origin='2000-01-01').sum()
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17min, dtype: int64

If you want to adjust the start of the bins with an offset Timedelta, the two following lines are equivalent:

>>> ts.resample('17min', origin='start').sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17min, dtype: int64
>>> ts.resample('17min', offset='23h30min').sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17min, dtype: int64

If you want to take the largest Timestamp as the end of the bins:

>>> ts.resample('17min', origin='end').sum()
2000-10-01 23:35:00     0
2000-10-01 23:52:00    18
2000-10-02 00:09:00    27
2000-10-02 00:26:00    63
Freq: 17min, dtype: int64

In contrast with the start_day, you can use end_day to take the ceiling midnight of the largest Timestamp as the end of the bins and drop the bins not containing data:

>>> ts.resample('17min', origin='end_day').sum()
2000-10-01 23:38:00     3
2000-10-01 23:55:00    15
2000-10-02 00:12:00    45
2000-10-02 00:29:00    45
Freq: 17min, dtype: int64
reset_index(level=None, *, drop=False, inplace=False, col_level=0, col_fill='', allow_duplicates=<no_default>, names=None)

Reset the index, or a level of it.

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

Parameters:
  • level (int, str, tuple, or list, default None) – Only remove the given levels from the index. Removes all levels by default.

  • drop (bool, default False) – Do not try to insert index into dataframe columns. This resets the index to the default integer index.

  • inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.

  • col_level (int or str, default 0) – If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.

  • col_fill (object, default '') – If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

  • allow_duplicates (bool, optional, default lib.no_default) –

    Allow duplicate column labels to be created.

    Added in version 1.5.0.

  • names (int, str or 1-dimensional list, default None) –

    Using the given string, rename the DataFrame column which contains the index data. If the DataFrame has a MultiIndex, this has to be a list or tuple with length equal to the number of levels.

    Added in version 1.5.0.

Returns:

DataFrame with the new index or None if inplace=True.

Return type:

DataFrame or None

See also

DataFrame.set_index

Opposite of reset_index.

DataFrame.reindex

Change to new indices or expand indices.

DataFrame.reindex_like

Change to same indices as other DataFrame.

Examples

>>> df = pd.DataFrame([('bird', 389.0),
...                    ('bird', 24.0),
...                    ('mammal', 80.5),
...                    ('mammal', np.nan)],
...                   index=['falcon', 'parrot', 'lion', 'monkey'],
...                   columns=('class', 'max_speed'))
>>> df
         class  max_speed
falcon    bird      389.0
parrot    bird       24.0
lion    mammal       80.5
monkey  mammal        NaN

When we reset the index, the old index is added as a column, and a new sequential index is used:

>>> df.reset_index()
    index   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal        NaN

We can use the drop parameter to avoid the old index being added as a column:

>>> df.reset_index(drop=True)
    class  max_speed
0    bird      389.0
1    bird       24.0
2  mammal       80.5
3  mammal        NaN

You can also use reset_index with MultiIndex.

>>> index = pd.MultiIndex.from_tuples([('bird', 'falcon'),
...                                    ('bird', 'parrot'),
...                                    ('mammal', 'lion'),
...                                    ('mammal', 'monkey')],
...                                   names=['class', 'name'])
>>> columns = pd.MultiIndex.from_tuples([('speed', 'max'),
...                                      ('species', 'type')])
>>> df = pd.DataFrame([(389.0, 'fly'),
...                    (24.0, 'fly'),
...                    (80.5, 'run'),
...                    (np.nan, 'jump')],
...                   index=index,
...                   columns=columns)
>>> df
               speed species
                 max    type
class  name
bird   falcon  389.0     fly
       parrot   24.0     fly
mammal lion     80.5     run
       monkey    NaN    jump

Using the names parameter, choose a name for the index column:

>>> df.reset_index(names=['classes', 'names'])
  classes   names  speed species
                     max    type
0    bird  falcon  389.0     fly
1    bird  parrot   24.0     fly
2  mammal    lion   80.5     run
3  mammal  monkey    NaN    jump

If the index has multiple levels, we can reset a subset of them:

>>> df.reset_index(level='class')
         class  speed species
                  max    type
name
falcon    bird  389.0     fly
parrot    bird   24.0     fly
lion    mammal   80.5     run
monkey  mammal    NaN    jump

If we are not dropping the index, by default, it is placed in the top level. We can place it in another level:

>>> df.reset_index(level='class', col_level=1)
                speed species
         class    max    type
name
falcon    bird  389.0     fly
parrot    bird   24.0     fly
lion    mammal   80.5     run
monkey  mammal    NaN    jump

When the index is inserted under another level, we can specify under which one with the parameter col_fill:

>>> df.reset_index(level='class', col_level=1, col_fill='species')
              species  speed species
                class    max    type
name
falcon           bird  389.0     fly
parrot           bird   24.0     fly
lion           mammal   80.5     run
monkey         mammal    NaN    jump

If we specify a nonexistent level for col_fill, it is created:

>>> df.reset_index(level='class', col_level=1, col_fill='genus')
                genus  speed species
                class    max    type
name
falcon           bird  389.0     fly
parrot           bird   24.0     fly
lion           mammal   80.5     run
monkey         mammal    NaN    jump
reverse()

Return a GeoSeries with the order of coordinates reversed.

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
>>> s.reverse()
0    POLYGON ((0 0, 0 1, 1 1, 0 0))
1        LINESTRING (1 0, 1 1, 0 0)
2                       POINT (0 0)
dtype: geometry

See also

GeoSeries.normalize

normalize order of coordinates

rfloordiv(other, axis='columns', level=None, fill_value=None)

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

Equivalent to other // dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, floordiv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rmod(other, axis='columns', level=None, fill_value=None)

Get Modulo of dataframe and other, element-wise (binary operator rmod).

Equivalent to other % dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mod.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rmul(other, axis='columns', level=None, fill_value=None)

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

Equivalent to other * dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mul.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=<no_default>, closed=None, step=None, method='single')

Provide rolling window calculations.

Parameters:
  • window (int, timedelta, str, offset, or BaseIndexer subclass) –

    Size of the moving window.

    If an integer, the fixed number of observations used for each window.

    If a timedelta, str, or offset, the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. To learn more about the offsets & frequency strings, please see this link.

    If a BaseIndexer subclass, the window boundaries based on the defined get_window_bounds method. Additional rolling keyword arguments, namely min_periods, center, closed and step will be passed to get_window_bounds.

  • min_periods (int, default None) –

    Minimum number of observations in window required to have a value; otherwise, result is np.nan.

    For a window that is specified by an offset, min_periods will default to 1.

    For a window that is specified by an integer, min_periods will default to the size of the window.

  • center (bool, default False) –

    If False, set the window labels as the right edge of the window index.

    If True, set the window labels as the center of the window index.

  • win_type (str, default None) –

    If None, all points are evenly weighted.

    If a string, it must be a valid scipy.signal window function.

    Certain Scipy window types require additional parameters to be passed in the aggregation function. The additional parameters must match the keywords specified in the Scipy window type method signature.

  • on (str, optional) –

    For a DataFrame, a column label or Index level on which to calculate the rolling window, rather than the DataFrame’s index.

    Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window.

  • axis (int or str, default 0) –

    If 0 or 'index', roll across the rows.

    If 1 or 'columns', roll across the columns.

    For Series this parameter is unused and defaults to 0.

    Deprecated since version 2.1.0: The axis keyword is deprecated. For axis=1, transpose the DataFrame first instead.

  • closed (str, default None) –

    If 'right', the first point in the window is excluded from calculations.

    If 'left', the last point in the window is excluded from calculations.

    If 'both', the no points in the window are excluded from calculations.

    If 'neither', the first and last points in the window are excluded from calculations.

    Default None ('right').

  • step (int, default None) –

    Added in version 1.5.0.

    Evaluate the window at every step result, equivalent to slicing as [::step]. window must be an integer. Using a step argument other than None or 1 will produce a result with a different shape than the input.

  • method (str {'single', 'table'}, default 'single') –

    Added in version 1.3.0.

    Execute the rolling operation per single column or row ('single') or over the entire object ('table').

    This argument is only implemented when specifying engine='numba' in the method call.

Returns:

An instance of Window is returned if win_type is passed. Otherwise, an instance of Rolling is returned.

Return type:

pandas.api.typing.Window or pandas.api.typing.Rolling

See also

expanding

Provides expanding transformations.

ewm

Provides exponential weighted functions.

Notes

See Windowing Operations for further usage details and examples.

Examples

>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0

window

Rolling sum with a window length of 2 observations.

>>> df.rolling(2).sum()
     B
0  NaN
1  1.0
2  3.0
3  NaN
4  NaN

Rolling sum with a window span of 2 seconds.

>>> df_time = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
...                        index=[pd.Timestamp('20130101 09:00:00'),
...                               pd.Timestamp('20130101 09:00:02'),
...                               pd.Timestamp('20130101 09:00:03'),
...                               pd.Timestamp('20130101 09:00:05'),
...                               pd.Timestamp('20130101 09:00:06')])
>>> df_time
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  2.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0
>>> df_time.rolling('2s').sum()
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  3.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0

Rolling sum with forward looking windows with 2 observations.

>>> indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
>>> df.rolling(window=indexer, min_periods=1).sum()
     B
0  1.0
1  3.0
2  2.0
3  4.0
4  4.0

min_periods

Rolling sum with a window length of 2 observations, but only needs a minimum of 1 observation to calculate a value.

>>> df.rolling(2, min_periods=1).sum()
     B
0  0.0
1  1.0
2  3.0
3  2.0
4  4.0

center

Rolling sum with the result assigned to the center of the window index.

>>> df.rolling(3, min_periods=1, center=True).sum()
     B
0  1.0
1  3.0
2  3.0
3  6.0
4  4.0
>>> df.rolling(3, min_periods=1, center=False).sum()
     B
0  0.0
1  1.0
2  3.0
3  3.0
4  6.0

step

Rolling sum with a window length of 2 observations, minimum of 1 observation to calculate a value, and a step of 2.

>>> df.rolling(2, min_periods=1, step=2).sum()
     B
0  0.0
2  3.0
4  4.0

win_type

Rolling sum with a window length of 2, using the Scipy 'gaussian' window type. std is required in the aggregation function.

>>> df.rolling(2, win_type='gaussian').sum(std=3)
          B
0       NaN
1  0.986207
2  2.958621
3       NaN
4       NaN

on

Rolling sum with a window length of 2 days.

>>> df = pd.DataFrame({
...     'A': [pd.to_datetime('2020-01-01'),
...           pd.to_datetime('2020-01-01'),
...           pd.to_datetime('2020-01-02'),],
...     'B': [1, 2, 3], },
...     index=pd.date_range('2020', periods=3))
>>> df
                    A  B
2020-01-01 2020-01-01  1
2020-01-02 2020-01-01  2
2020-01-03 2020-01-02  3
>>> df.rolling('2D', on='A').sum()
                    A    B
2020-01-01 2020-01-01  1.0
2020-01-02 2020-01-01  3.0
2020-01-03 2020-01-02  6.0
rotate(angle, origin='center', use_radians=False)

Return a GeoSeries with rotated geometries.

See http://shapely.readthedocs.io/en/latest/manual.html#shapely.affinity.rotate for details.

Parameters:
  • angle (float) – The angle of rotation can be specified in either degrees (default) or radians by setting use_radians=True. Positive angles are counter-clockwise and negative are clockwise rotations.

  • origin (string, Point, or tuple (x, y)) – The point of origin can be a keyword ‘center’ for the bounding box center (default), ‘centroid’ for the geometry’s centroid, a Point object or a coordinate tuple (x, y).

  • use_radians (boolean) – Whether to interpret the angle of rotation as degrees or radians

Examples

>>> from shapely.geometry import Point, LineString, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Point(1, 1),
...         LineString([(1, -1), (1, 0)]),
...         Polygon([(3, -1), (4, 0), (3, 1)]),
...     ]
... )
>>> s
0                         POINT (1 1)
1              LINESTRING (1 -1, 1 0)
2    POLYGON ((3 -1, 4 0, 3 1, 3 -1))
dtype: geometry
>>> s.rotate(90)
0                                          POINT (1 1)
1                      LINESTRING (1.5 -0.5, 0.5 -0.5)
2    POLYGON ((4.5 -0.5, 3.5 0.5, 2.5 -0.5, 4.5 -0.5))
dtype: geometry
>>> s.rotate(90, origin=(0, 0))
0                       POINT (-1 1)
1              LINESTRING (1 1, 0 1)
2    POLYGON ((1 3, 0 4, -1 3, 1 3))
dtype: geometry
round(decimals=0, *args, **kwargs)

Round a DataFrame to a variable number of decimal places.

Parameters:
  • decimals (int, dict, Series) – Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.

  • *args – Additional keywords have no effect but might be accepted for compatibility with numpy.

  • **kwargs – Additional keywords have no effect but might be accepted for compatibility with numpy.

Returns:

A DataFrame with the affected columns rounded to the specified number of decimal places.

Return type:

DataFrame

See also

numpy.around

Round a numpy array to the given number of decimals.

Series.round

Round a Series to the given number of decimals.

Examples

>>> df = pd.DataFrame([(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...                   columns=['dogs', 'cats'])
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = pd.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0
rpow(other, axis='columns', level=None, fill_value=None)

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

Equivalent to other ** dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rsub(other, axis='columns', level=None, fill_value=None)

Get Subtraction of dataframe and other, element-wise (binary operator rsub).

Equivalent to other - dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, sub.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
rtruediv(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters:
  • n (int, optional) – Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

  • frac (float, optional) – Fraction of axis items to return. Cannot be used with n.

  • replace (bool, default False) – Allow or disallow sampling of the same row more than once.

  • weights (str or ndarray-like, optional) – Default ‘None’ results in equal probability weighting. If passed a Series, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a DataFrame, will accept the name of a column when axis = 0. Unless weights are a Series, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed.

  • random_state (int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional) –

    If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.

    Changed in version 1.4.0: np.random.Generator objects now accepted

  • axis ({0 or 'index', 1 or 'columns', None}, default None) – Axis to sample. Accepts axis number or name. Default is stat axis for given data type. For Series this parameter is unused and defaults to None.

  • ignore_index (bool, default False) –

    If True, the resulting index will be labeled 0, 1, …, n - 1.

    Added in version 1.3.0.

Returns:

A new object of same type as caller containing n items randomly sampled from the caller object.

Return type:

Series or DataFrame

See also

DataFrameGroupBy.sample

Generates random samples from each group of a DataFrame object.

SeriesGroupBy.sample

Generates random samples from each group of a Series object.

numpy.random.choice

Generates a random sample from a given 1-D numpy array.

Notes

If frac > 1, replacement should be set to True.

Examples

>>> df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
...                    'num_wings': [2, 0, 0, 0],
...                    'num_specimen_seen': [10, 2, 1, 8]},
...                   index=['falcon', 'dog', 'spider', 'fish'])
>>> df
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
dog            4          0                  2
spider         8          0                  1
fish           0          0                  8

Extract 3 random elements from the Series df['num_legs']: Note that we use random_state to ensure the reproducibility of the examples.

>>> df['num_legs'].sample(n=3, random_state=1)
fish      0
spider    8
falcon    2
Name: num_legs, dtype: int64

A random 50% sample of the DataFrame with replacement:

>>> df.sample(frac=0.5, replace=True, random_state=1)
      num_legs  num_wings  num_specimen_seen
dog          4          0                  2
fish         0          0                  8

An upsample sample of the DataFrame with replacement: Note that replace parameter has to be True for frac parameter > 1.

>>> df.sample(frac=2, replace=True, random_state=1)
        num_legs  num_wings  num_specimen_seen
dog            4          0                  2
fish           0          0                  8
falcon         2          2                 10
falcon         2          2                 10
fish           0          0                  8
dog            4          0                  2
fish           0          0                  8
dog            4          0                  2

Using a DataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.

>>> df.sample(n=2, weights='num_specimen_seen', random_state=1)
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
fish           0          0                  8
sample_points(size, method='uniform', seed=None, rng=None, **kwargs)

Sample points from each geometry.

Generate a MultiPoint per each geometry containing points sampled from the geometry. You can either sample randomly from a uniform distribution or use an advanced sampling algorithm from the pointpats package.

For polygons, this samples within the area of the polygon. For lines, this samples along the length of the linestring. For multi-part geometries, the weights of each part are selected according to their relevant attribute (area for Polygons, length for LineStrings), and then points are sampled from each part.

Any other geometry type (e.g. Point, GeometryCollection) is ignored, and an empty MultiPoint geometry is returned.

Parameters:
  • size (int | array-like) – The size of the sample requested. Indicates the number of samples to draw from each geometry. If an array of the same length as a GeoSeries is passed, it denotes the size of a sample per geometry.

  • method (str, default "uniform") – The sampling method. uniform samples uniformly at random from a geometry using numpy.random.uniform. Other allowed strings (e.g. "cluster_poisson") denote sampling function name from the pointpats.random module (see http://pysal.org/pointpats/api.html#random-distributions). Pointpats methods are implemented for (Multi)Polygons only and will return an empty MultiPoint for other geometry types.

  • rng ({None, int, array_like[ints], SeedSequence, BitGenerator, Generator}, optional) – A random generator or seed to initialize the numpy BitGenerator. If None, then fresh, unpredictable entropy will be pulled from the OS.

  • **kwargs (dict) – Options for the pointpats sampling algorithms.

Returns:

Points sampled within (or along) each geometry.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(1, -1), (1, 0), (0, 0)]),
...         Polygon([(3, -1), (4, 0), (3, 1)]),
...     ]
... )
>>> s.sample_points(size=10)
0    MULTIPOINT ((0.1045 -0.10294), (0.35249 -0.264...
1    MULTIPOINT ((3.03261 -0.43069), (3.10068 0.114...
Name: sampled_points, dtype: geometry
scale(xfact=1.0, yfact=1.0, zfact=1.0, origin='center')

Return a GeoSeries with scaled geometries.

The geometries can be scaled by different factors along each dimension. Negative scale factors will mirror or reflect coordinates.

See http://shapely.readthedocs.io/en/latest/manual.html#shapely.affinity.scale for details.

Parameters:
  • xfact (float, float, float) – Scaling factors for the x, y, and z dimensions respectively.

  • yfact (float, float, float) – Scaling factors for the x, y, and z dimensions respectively.

  • zfact (float, float, float) – Scaling factors for the x, y, and z dimensions respectively.

  • origin (string, Point, or tuple) – The point of origin can be a keyword ‘center’ for the 2D bounding box center (default), ‘centroid’ for the geometry’s 2D centroid, a Point object or a coordinate tuple (x, y, z).

Examples

>>> from shapely.geometry import Point, LineString, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Point(1, 1),
...         LineString([(1, -1), (1, 0)]),
...         Polygon([(3, -1), (4, 0), (3, 1)]),
...     ]
... )
>>> s
0                         POINT (1 1)
1              LINESTRING (1 -1, 1 0)
2    POLYGON ((3 -1, 4 0, 3 1, 3 -1))
dtype: geometry
>>> s.scale(2, 3)
0                                 POINT (1 1)
1                      LINESTRING (1 -2, 1 1)
2    POLYGON ((2.5 -3, 4.5 0, 2.5 3, 2.5 -3))
dtype: geometry
>>> s.scale(2, 3, origin=(0, 0))
0                         POINT (2 3)
1              LINESTRING (2 -3, 2 0)
2    POLYGON ((6 -3, 8 0, 6 3, 6 -3))
dtype: geometry
segmentize(max_segment_length)

Return a GeoSeries with vertices added to line segments based on maximum segment length.

Additional vertices will be added to every line segment in an input geometry so that segments are no longer than the provided maximum segment length. New vertices will evenly subdivide each segment. Only linear components of input geometries are densified; other geometries are returned unmodified.

Parameters:

max_segment_length (float | array-like) – Additional vertices will be added so that all line segments are no longer than this value. Must be greater than 0.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Polygon, LineString
>>> s = geopandas.GeoSeries(
...     [
...         LineString([(0, 0), (0, 10)]),
...         Polygon([(0, 0), (10, 0), (10, 10), (0, 10), (0, 0)]),
...     ],
... )
>>> s
0                     LINESTRING (0 0, 0 10)
1    POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))
dtype: geometry
>>> s.segmentize(max_segment_length=5)
0                          LINESTRING (0 0, 0 5, 0 10)
1    POLYGON ((0 0, 5 0, 10 0, 10 5, 10 10, 5 10, 0...
dtype: geometry
select_dtypes(include=None, exclude=None)

Return a subset of the DataFrame’s columns based on the column dtypes.

Parameters:
  • include (scalar or list-like) – A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.

  • exclude (scalar or list-like) – A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.

Returns:

The subset of the frame including the dtypes in include and excluding the dtypes in exclude.

Return type:

DataFrame

Raises:

ValueError

  • If both of include and exclude are empty * If include and exclude have overlapping elements * If any kind of string dtype is passed in.

See also

DataFrame.dtypes

Return Series with the data type of each column.

Notes

  • To select all numeric types, use np.number or 'number'

  • To select strings you must use the object dtype, but note that this will return all object dtype columns. With pd.options.future.infer_string enabled, using "str" will work to select all string columns.

  • See the numpy dtype hierarchy

  • To select datetimes, use np.datetime64, 'datetime' or 'datetime64'

  • To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'

  • To select Pandas categorical dtypes, use 'category'

  • To select Pandas datetimetz dtypes, use 'datetimetz' or 'datetime64[ns, tz]'

Examples

>>> df = pd.DataFrame({'a': [1, 2] * 3,
...                    'b': [True, False] * 3,
...                    'c': [1.0, 2.0] * 3})
>>> df
        a      b  c
0       1   True  1.0
1       2  False  2.0
2       1   True  1.0
3       2  False  2.0
4       1   True  1.0
5       2  False  2.0
>>> df.select_dtypes(include='bool')
   b
0  True
1  False
2  True
3  False
4  True
5  False
>>> df.select_dtypes(include=['float64'])
   c
0  1.0
1  2.0
2  1.0
3  2.0
4  1.0
5  2.0
>>> df.select_dtypes(exclude=['int64'])
       b    c
0   True  1.0
1  False  2.0
2   True  1.0
3  False  2.0
4   True  1.0
5  False  2.0
sem(axis=0, skipna=True, ddof=1, numeric_only=False, **kwargs)

Return unbiased standard error of the mean over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters:
  • axis ({index (0), columns (1)}) –

    For Series this parameter is unused and defaults to 0.

    Warning

    The behavior of DataFrame.sem with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

Returns:

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.sem().round(6)
0.57735

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.sem()
a   0.5
b   0.5
dtype: float64

Using axis=1

>>> df.sem(axis=1)
tiger   0.5
zebra   0.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.sem(numeric_only=True)
a   0.5
dtype: float64

Return type:

Series or DataFrame (if level specified)

set_axis(labels, *, axis=0, copy=None)

Assign desired index to given axis.

Indexes for column or row labels can be changed by assigning a list-like or Index.

Parameters:
  • labels (list-like, Index) – The values for the new index.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to update. The value 0 identifies the rows. For Series this parameter is unused and defaults to 0.

  • copy (bool, default True) –

    Whether to make a copy of the underlying data.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

An object of type DataFrame.

Return type:

DataFrame

See also

DataFrame.rename_axis

Alter the name of the index or columns. Examples ——– >>> df = pd.DataFrame({“A”: [1, 2, 3], “B”: [4, 5, 6]}) Change the row labels. >>> df.set_axis([‘a’, ‘b’, ‘c’], axis=’index’) A B a 1 4 b 2 5 c 3 6 Change the column labels. >>> df.set_axis([‘I’, ‘II’], axis=’columns’) I II 0 1 4 1 2 5 2 3 6

set_crs(crs=None, epsg=None, inplace=False, allow_override=False)[source]

Set the Coordinate Reference System (CRS) of the GeoDataFrame.

If there are multiple geometry columns within the GeoDataFrame, only the CRS of the active geometry column is set.

Pass None to remove CRS from the active geometry column.

Notes

The underlying geometries are not transformed to this CRS. To transform the geometries to a new CRS, use the to_crs method.

Parameters:
  • crs (pyproj.CRS | None, optional) – The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • epsg (int, optional) – EPSG code specifying the projection.

  • inplace (bool, default False) – If True, the CRS of the GeoDataFrame will be changed in place (while still returning the result) instead of making a copy of the GeoDataFrame.

  • allow_override (bool, default False) – If the the GeoDataFrame already has a CRS, allow to replace the existing CRS, even when both are not equal.

Return type:

GeoDataFrame | None

Examples

>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = geopandas.GeoDataFrame(d)
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)

Setting CRS to a GeoDataFrame without one:

>>> gdf.crs is None
True
>>> gdf = gdf.set_crs('epsg:3857')
>>> gdf.crs
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World - 85°S to 85°N
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Overriding existing CRS:

>>> gdf = gdf.set_crs(4326, allow_override=True)

Without allow_override=True, set_crs returns an error if you try to override CRS.

See also

GeoDataFrame.to_crs

re-project to another CRS

set_flags(*, copy=False, allows_duplicate_labels=None)

Return a new object with updated flags.

Parameters:
  • copy (bool, default False) –

    Specify if a copy of the object should be made.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

  • allows_duplicate_labels (bool, optional) – Whether the returned object allows duplicate labels.

Returns:

The same type as the caller.

Return type:

Series or DataFrame

See also

DataFrame.attrs

Global metadata applying to this dataset.

DataFrame.flags

Global flags applying to this object.

Notes

This method returns a new object that’s a view on the same data as the input. Mutating the input or the output values will be reflected in the other.

This method is intended to be used in method chains.

“Flags” differ from “metadata”. Flags reflect properties of the pandas object (the Series or DataFrame). Metadata refer to properties of the dataset, and should be stored in DataFrame.attrs.

Examples

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False
set_geometry(col, drop=None, inplace=False, crs=None)[source]

Set the GeoDataFrame geometry using either an existing column or the specified input. By default yields a new object.

The original geometry column is replaced with the input.

Parameters:
  • col (column label or array-like) – An existing column name or values to set as the new geometry column. If values (array-like, (Geo)Series) are passed, then if they are named (Series) the new geometry column will have the corresponding name, otherwise the existing geometry column will be replaced. If there is no existing geometry column, the new geometry column will use the default name “geometry”.

  • drop (boolean, default False) –

    When specifying a named Series or an existing column name for col, controls if the previous geometry column should be dropped from the result. The default of False keeps both the old and new geometry column.

    Deprecated since version 1.0.0.

  • inplace (boolean, default False) – Modify the GeoDataFrame in place (do not create a new object)

  • crs (pyproj.CRS, optional) – Coordinate system to use. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string. If passed, overrides both DataFrame and col’s crs. Otherwise, tries to get crs from passed col values or DataFrame.

Return type:

GeoDataFrame | None

Examples

>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)

Passing an array:

>>> df1 = gdf.set_geometry([Point(0,0), Point(1,1)])
>>> df1
    col1     geometry
0  name1  POINT (0 0)
1  name2  POINT (1 1)

Using existing column:

>>> gdf["buffered"] = gdf.buffer(2)
>>> df2 = gdf.set_geometry("buffered")
>>> df2.geometry
0    POLYGON ((3 2, 2.99037 1.80397, 2.96157 1.6098...
1    POLYGON ((4 1, 3.99037 0.80397, 3.96157 0.6098...
Name: buffered, dtype: geometry
Return type:

GeoDataFrame

Parameters:

See also

GeoDataFrame.rename_geometry

rename an active geometry column

set_index(keys, *, drop=True, append=False, inplace=False, verify_integrity=False)

Set the DataFrame index using existing columns.

Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.

Parameters:
  • keys (label or array-like or list of labels/arrays) – This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses Series, Index, np.ndarray, and instances of Iterator.

  • drop (bool, default True) – Delete columns to be used as the new index.

  • append (bool, default False) – Whether to append columns to existing index.

  • inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.

  • verify_integrity (bool, default False) – Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method.

Returns:

Changed row labels or None if inplace=True.

Return type:

DataFrame or None

See also

DataFrame.reset_index

Opposite of set_index.

DataFrame.reindex

Change to new indices or expand indices.

DataFrame.reindex_like

Change to same indices as other DataFrame.

Examples

>>> df = pd.DataFrame({'month': [1, 4, 7, 10],
...                    'year': [2012, 2014, 2013, 2014],
...                    'sale': [55, 40, 84, 31]})
>>> df
   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31

Set the index to become the ‘month’ column:

>>> df.set_index('month')
       year  sale
month
1      2012    55
4      2014    40
7      2013    84
10     2014    31

Create a MultiIndex using columns ‘year’ and ‘month’:

>>> df.set_index(['year', 'month'])
            sale
year  month
2012  1     55
2014  4     40
2013  7     84
2014  10    31

Create a MultiIndex using an Index and a column:

>>> df.set_index([pd.Index([1, 2, 3, 4]), 'year'])
         month  sale
   year
1  2012  1      55
2  2014  4      40
3  2013  7      84
4  2014  10     31

Create a MultiIndex using two Series:

>>> s = pd.Series([1, 2, 3, 4])
>>> df.set_index([s, s**2])
      month  year  sale
1 1       1  2012    55
2 4       4  2014    40
3 9       7  2013    84
4 16     10  2014    31
set_precision(grid_size, mode='valid_output')

Return a GeoSeries with the precision set to a precision grid size.

By default, geometries use double precision coordinates (grid_size=0).

Coordinates will be rounded if a precision grid is less precise than the input geometry. Duplicated vertices will be dropped from lines and polygons for grid sizes greater than 0. Line and polygon geometries may collapse to empty geometries if all vertices are closer together than grid_size. Spikes or sections in Polygons narrower than grid_size after rounding the vertices will be removed, which can lead to MultiPolygons or empty geometries. Z values, if present, will not be modified.

Parameters:
  • grid_size (float) – Precision grid size. If 0, will use double precision (will not modify geometry if precision grid size was not previously set). If this value is more precise than input geometry, the input geometry will not be modified.

  • mode ({'valid_output', 'pointwise', 'keep_collapsed'}, default 'valid_output') –

    This parameter determines the way a precision reduction is applied on the geometry. There are three modes:

    • 'valid_output' (default): The output is always valid. Collapsed geometry elements (including both polygons and lines) are removed. Duplicate vertices are removed.

    • 'pointwise': Precision reduction is performed pointwise. Output geometry may be invalid due to collapse or self-intersection. Duplicate vertices are not removed.

    • 'keep_collapsed': Like the default mode, except that collapsed linear geometry elements are preserved. Collapsed polygonal input elements are removed. Duplicate vertices are removed.

Examples

>>> from shapely import LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...        Point(0.9, 0.9),
...        Point(0.9, 0.9, 0.9),
...        LineString([(0, 0), (0, 0.1), (0, 1), (1, 1)]),
...        LineString([(0, 0), (0, 0.1), (0.1, 0.1)])
...     ],
... )
>>> s
0                      POINT (0.9 0.9)
1                POINT Z (0.9 0.9 0.9)
2    LINESTRING (0 0, 0 0.1, 0 1, 1 1)
3     LINESTRING (0 0, 0 0.1, 0.1 0.1)
dtype: geometry
>>> s.set_precision(1)
0                   POINT (1 1)
1             POINT Z (1 1 0.9)
2    LINESTRING (0 0, 0 1, 1 1)
3              LINESTRING EMPTY
dtype: geometry
>>> s.set_precision(1, mode="pointwise")
0                        POINT (1 1)
1                  POINT Z (1 1 0.9)
2    LINESTRING (0 0, 0 0, 0 1, 1 1)
3         LINESTRING (0 0, 0 0, 0 0)
dtype: geometry
>>> s.set_precision(1, mode="keep_collapsed")
0                   POINT (1 1)
1             POINT Z (1 1 0.9)
2    LINESTRING (0 0, 0 1, 1 1)
3         LINESTRING (0 0, 0 0)
dtype: geometry

Notes

Subsequent operations will always be performed in the precision of the geometry with higher precision (smaller grid_size). That same precision will be attached to the operation outputs.

Input geometries should be geometrically valid; unexpected results may occur if input geometries are not. You can check the validity with is_valid() and fix invalid geometries with make_valid() methods.

property shape: tuple[int, int]

Return a tuple representing the dimensionality of the DataFrame.

See also

ndarray.shape

Tuple of array dimensions.

Examples

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.shape
(2, 2)
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4],
...                    'col3': [5, 6]})
>>> df.shape
(2, 3)
shared_paths(other, align=None)

Return the shared paths between two geometries.

Geometries within the GeoSeries should be only (Multi)LineStrings or LinearRings. A GeoSeries of GeometryCollections is returned with two elements in each GeometryCollection. The first element is a MultiLineString containing shared paths with the same direction for both inputs. The second element is a MultiLineString containing shared paths with the opposite direction for the two inputs.

You can extract individual geometries of the resulting GeometryCollection using the GeoSeries.get_geometry() method.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the shared paths with. Has to contain only (Multi)LineString or LinearRing geometry types.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import LineString, MultiLineString
>>> s = geopandas.GeoSeries(
...     [
...         LineString([(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)]),
...         LineString([(1, 0), (2, 0), (2, 1), (1, 1), (1, 0)]),
...         MultiLineString([[(1, 0), (2, 0)], [(2, 1), (1, 1), (1, 0)]]),
...     ],
... )
>>> s
0             LINESTRING (0 0, 1 0, 1 1, 0 1, 0 0)
1             LINESTRING (1 0, 2 0, 2 1, 1 1, 1 0)
2    MULTILINESTRING ((1 0, 2 0), (2 1, 1 1, 1 0))
dtype: geometry

We can find the shared paths between each geometry and a single shapely geometry:

../../../_static/binary_op-03.svg
>>> l = LineString([(1, 1), (0, 1)])
>>> s.shared_paths(l)
0    GEOMETRYCOLLECTION (MULTILINESTRING ((1 1, 0 1...
1    GEOMETRYCOLLECTION (MULTILINESTRING EMPTY, MUL...
2    GEOMETRYCOLLECTION (MULTILINESTRING EMPTY, MUL...
dtype: geometry

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices than the one below. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s2 = geopandas.GeoSeries(
...     [
...         LineString([(1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)]),
...         LineString([(1, 0), (2, 0), (2, 1), (1, 1), (1, 0)]),
...     ],
...     index=[1, 2, 3]
... )
>>> s.shared_paths(s2, align=True)
0                                                 None
1    GEOMETRYCOLLECTION (MULTILINESTRING EMPTY, MUL...
2    GEOMETRYCOLLECTION (MULTILINESTRING EMPTY, MUL...
3                                                 None
dtype: geometry
>>>
>>> s.shared_paths(s2, align=False)
0    GEOMETRYCOLLECTION (MULTILINESTRING ((1 1, 0 1...
1    GEOMETRYCOLLECTION (MULTILINESTRING EMPTY, MUL...
2    GEOMETRYCOLLECTION (MULTILINESTRING ((1 0, 2 0...
dtype: geometry

See also

GeoSeries.get_geometry

shift(periods=1, freq=None, axis=0, fill_value=<no_default>, suffix=None)

Shift index by desired number of periods with an optional time freq.

When freq is not passed, shift the index without realigning the data. If freq is passed (in this case, the index must be date or datetime, or it will raise a NotImplementedError), the index will be increased using the periods and the freq. freq can be inferred when specified as “infer” as long as either freq or inferred_freq attribute is set in the index.

Parameters:
  • periods (int or Sequence) – Number of periods to shift. Can be positive or negative. If an iterable of ints, the data will be shifted once by each int. This is equivalent to shifting by one value at a time and concatenating all resulting frames. The resulting columns will have the shift suffixed to their column names. For multiple periods, axis must not be 1.

  • freq (DateOffset, tseries.offsets, timedelta, or str, optional) – Offset to use from the tseries module or time rule (e.g. ‘EOM’). If freq is specified then the index values are shifted but the data is not realigned. That is, use freq if you would like to extend the index when shifting and preserve the original data. If freq is specified as “infer” then it will be inferred from the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown.

  • axis ({0 or 'index', 1 or 'columns', None}, default None) – Shift direction. For Series this parameter is unused and defaults to 0.

  • fill_value (object, optional) – The scalar value to use for newly introduced missing values. the default depends on the dtype of self. For numeric data, np.nan is used. For datetime, timedelta, or period data, etc. NaT is used. For extension dtypes, self.dtype.na_value is used.

  • suffix (str, optional) – If str and periods is an iterable, this is added after the column name and before the shift value for each shifted column name.

Returns:

Copy of input object, shifted.

Return type:

DataFrame

See also

Index.shift

Shift values of Index.

DatetimeIndex.shift

Shift values of DatetimeIndex.

PeriodIndex.shift

Shift values of PeriodIndex.

Examples

>>> df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
...                    "Col2": [13, 23, 18, 33, 48],
...                    "Col3": [17, 27, 22, 37, 52]},
...                   index=pd.date_range("2020-01-01", "2020-01-05"))
>>> df
            Col1  Col2  Col3
2020-01-01    10    13    17
2020-01-02    20    23    27
2020-01-03    15    18    22
2020-01-04    30    33    37
2020-01-05    45    48    52
>>> df.shift(periods=3)
            Col1  Col2  Col3
2020-01-01   NaN   NaN   NaN
2020-01-02   NaN   NaN   NaN
2020-01-03   NaN   NaN   NaN
2020-01-04  10.0  13.0  17.0
2020-01-05  20.0  23.0  27.0
>>> df.shift(periods=1, axis="columns")
            Col1  Col2  Col3
2020-01-01   NaN    10    13
2020-01-02   NaN    20    23
2020-01-03   NaN    15    18
2020-01-04   NaN    30    33
2020-01-05   NaN    45    48
>>> df.shift(periods=3, fill_value=0)
            Col1  Col2  Col3
2020-01-01     0     0     0
2020-01-02     0     0     0
2020-01-03     0     0     0
2020-01-04    10    13    17
2020-01-05    20    23    27
>>> df.shift(periods=3, freq="D")
            Col1  Col2  Col3
2020-01-04    10    13    17
2020-01-05    20    23    27
2020-01-06    15    18    22
2020-01-07    30    33    37
2020-01-08    45    48    52
>>> df.shift(periods=3, freq="infer")
            Col1  Col2  Col3
2020-01-04    10    13    17
2020-01-05    20    23    27
2020-01-06    15    18    22
2020-01-07    30    33    37
2020-01-08    45    48    52
>>> df['Col1'].shift(periods=[0, 1, 2])
            Col1_0  Col1_1  Col1_2
2020-01-01      10     NaN     NaN
2020-01-02      20    10.0     NaN
2020-01-03      15    20.0    10.0
2020-01-04      30    15.0    20.0
2020-01-05      45    30.0    15.0
shortest_line(other, align=None)

Return the shortest two-point line between two geometries.

The resulting line consists of two points, representing the nearest points between the geometry pair. The line always starts in the first geometry a and ends in he second geometry b. The endpoints of the line will not necessarily be existing vertices of the input geometries a and b, but can also be a point along a line segment.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the shortest line with.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry

We can also do intersection of each geometry and a single shapely geometry:

../../../_static/binary_op-03.svg
>>> p = Point(3, 3)
>>> s.shortest_line(p)
0    LINESTRING (2 2, 3 3)
1    LINESTRING (2 2, 3 3)
2    LINESTRING (2 2, 3 3)
3    LINESTRING (1 1, 3 3)
4    LINESTRING (0 1, 3 3)
dtype: geometry

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices than the one below. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Point(3, 1),
...         LineString([(1, 0), (2, 0)]),
...         Point(10, 15),
...         Point(0, 1),
...     ],
...     index=range(1, 6),
... )
>>> s.shortest_line(s2, align=True)
0                             None
1    LINESTRING (0.5 0.5, 0.5 0.5)
2            LINESTRING (2 2, 3 1)
3            LINESTRING (2 0, 2 0)
4          LINESTRING (0 1, 10 15)
5                             None
dtype: geometry
>>>
>>> s.shortest_line(s2, align=False)
0    LINESTRING (0.5 0.5, 0.5 0.5)
1            LINESTRING (2 2, 3 1)
2        LINESTRING (0.5 0.5, 1 0)
3          LINESTRING (0 2, 10 15)
4            LINESTRING (0 1, 0 1)
dtype: geometry
simplify(tolerance, preserve_topology=True)

Return a GeoSeries containing a simplified representation of each geometry.

The algorithm (Douglas-Peucker) recursively splits the original line into smaller parts and connects these parts’ endpoints by a straight line. Then, it removes all points whose distance to the straight line is smaller than tolerance. It does not move any points and it always preserves endpoints of the original line or polygon. See https://shapely.readthedocs.io/en/latest/manual.html#object.simplify for details

Simplifies individual geometries independently, without considering the topology of a potential polygonal coverage. If you would like to treat the GeoSeries as a coverage and simplify its edges, while preserving the coverage topology, see simplify_coverage().

Parameters:
  • tolerance (float) – All parts of a simplified geometry will be no more than tolerance distance from the original. It has the same units as the coordinate reference system of the GeoSeries. For example, using tolerance=100 in a projected CRS with meters as units means a distance of 100 meters in reality.

  • preserve_topology (bool (default True)) – False uses a quicker algorithm, but may produce self-intersecting or otherwise invalid geometries.

Notes

Invalid geometric objects may result from simplification that does not preserve topology and simplification may be sensitive to the order of coordinates: two geometries differing only in order of coordinates may be simplified differently.

See also

simplify_coverage

simplify geometries using coverage simplification

Examples

>>> from shapely.geometry import Point, LineString
>>> s = geopandas.GeoSeries(
...     [Point(0, 0).buffer(1), LineString([(0, 0), (1, 10), (0, 20)])]
... )
>>> s
0    POLYGON ((1 0, 0.99518 -0.09802, 0.98079 -0.19...
1                         LINESTRING (0 0, 1 10, 0 20)
dtype: geometry
>>> s.simplify(1)
0    POLYGON ((0 1, 0 -1, -1 0, 0 1))
1              LINESTRING (0 0, 0 20)
dtype: geometry
simplify_coverage(tolerance, *, simplify_boundary=True)

Return a GeoSeries containing a simplified representation of polygonal coverage.

Assumes that the GeoSeries forms a polygonal coverage. Under this assumption, the method simplifies the edges using the Visvalingam-Whyatt algorithm, while preserving a valid coverage. In the most simplified case, polygons are reduced to triangles.

A GeoSeries of valid polygons is considered a coverage if the polygons are:

  • Non-overlapping - polygons do not overlap (their interiors do not intersect)

  • Edge-Matched - vertices along shared edges are identical

The method allows simplification of all edges including the outer boundaries of the coverage or simplification of only the inner (shared) edges.

If there are other geometry types than Polygons or MultiPolygons present, the method will raise an error.

If the geometry is polygonal but does not form a valid coverage due to overlaps, it will be simplified but it may result in invalid coverage topology.

Requires Shapely >= 2.1.

Added in version 1.1.0.

Parameters:
  • tolerance (float) – The degree of simplification roughly equal to the square root of the area of triangles that will be removed. It has the same units as the coordinate reference system of the GeoSeries. For example, using tolerance=100 in a projected CRS with meters as units means a distance of 100 meters in reality.

  • simplify_boundary (bool (default True)) – By default (True), simplifies both internal edges of the coverage as well as its boundary. If set to False, only simplifies internal edges.

See also

simplify

simplification of individual geometries

Examples

>>> from shapely.geometry import Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1.1), (2, 0), (1.5, 1), (2, 2), (0, 2)]),
...         Polygon([(2, 0), (4, 0), (4, 2), (2, 2), (1.5, 1)]),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1.1, 2 0, 1.5 1, 2 2, 0 2, 0 0))
1           POLYGON ((2 0, 4 0, 4 2, 2 2, 1.5 1, 2 0))
dtype: geometry
>>> s.simplify_coverage(1)
0         POLYGON ((2 0, 2 2, 0 2, 2 0))
1    POLYGON ((2 0, 4 0, 4 2, 2 2, 2 0))
dtype: geometry
>>> s.simplify_coverage(1, simplify_boundary=False)
0    POLYGON ((2 0, 2 2, 0 2, 0 0, 1 1.1, 2 0))
1           POLYGON ((2 0, 4 0, 4 2, 2 2, 2 0))
dtype: geometry
property sindex

Generate the spatial index.

Creates R-tree spatial index based on shapely.STRtree.

Note that the spatial index may not be fully initialized until the first use.

Examples

>>> from shapely.geometry import box
>>> s = geopandas.GeoSeries(geopandas.points_from_xy(range(5), range(5)))
>>> s
0    POINT (0 0)
1    POINT (1 1)
2    POINT (2 2)
3    POINT (3 3)
4    POINT (4 4)
dtype: geometry

Query the spatial index with a single geometry based on the bounding box:

>>> s.sindex.query(box(1, 1, 3, 3))
array([1, 2, 3])

Query the spatial index with a single geometry based on the predicate:

>>> s.sindex.query(box(1, 1, 3, 3), predicate="contains")
array([2])

Query the spatial index with an array of geometries based on the bounding box:

>>> s2 = geopandas.GeoSeries([box(1, 1, 3, 3), box(4, 4, 5, 5)])
>>> s2
0    POLYGON ((3 1, 3 3, 1 3, 1 1, 3 1))
1    POLYGON ((5 4, 5 5, 4 5, 4 4, 5 4))
dtype: geometry
>>> s.sindex.query(s2)
array([[0, 0, 0, 1],
       [1, 2, 3, 4]])

Query the spatial index with an array of geometries based on the predicate:

>>> s.sindex.query(s2, predicate="contains")
array([[0],
       [2]])
property size: int

Return an int representing the number of elements in this object.

Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.

See also

ndarray.size

Number of elements in the array.

Examples

>>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.size
3
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.size
4
sjoin(df, how='inner', predicate='intersects', lsuffix='left', rsuffix='right', **kwargs)[source]

Spatial join of two GeoDataFrames.

See the User Guide page ../../user_guide/mergingdata for details.

Parameters:
  • df (GeoDataFrame)

  • how (string, default 'inner') –

    The type of join:

    • ’left’: use keys from left_df; retain only left_df geometry column

    • ’right’: use keys from right_df; retain only right_df geometry column

    • ’inner’: use intersection of keys from both dfs; retain only left_df geometry column

  • predicate (string, default 'intersects') – Binary predicate. Valid values are determined by the spatial index used. You can check the valid values in left_df or right_df as left_df.sindex.valid_query_predicates or right_df.sindex.valid_query_predicates

  • lsuffix (string, default 'left') – Suffix to apply to overlapping column names (left GeoDataFrame).

  • rsuffix (string, default 'right') – Suffix to apply to overlapping column names (right GeoDataFrame).

  • distance (number or array_like, optional) – Distance(s) around each input geometry within which to query the tree for the ‘dwithin’ predicate. If array_like, must be one-dimesional with length equal to length of left GeoDataFrame. Required if predicate='dwithin'.

  • on_attribute (string, list or tuple) – Column name(s) to join on as an additional join restriction on top of the spatial predicate. These must be found in both DataFrames. If set, observations are joined only if the predicate applies and values in specified columns match.

Return type:

GeoDataFrame

Examples

>>> import geodatasets
>>> chicago = geopandas.read_file(
...     geodatasets.get_path("geoda.chicago_commpop")
... )
>>> groceries = geopandas.read_file(
...     geodatasets.get_path("geoda.groceries")
... ).to_crs(chicago.crs)
>>> chicago.head()
         community  ...                                           geometry
0          DOUGLAS  ...  MULTIPOLYGON (((-87.60914 41.84469, -87.60915 ...
1          OAKLAND  ...  MULTIPOLYGON (((-87.59215 41.81693, -87.59231 ...
2      FULLER PARK  ...  MULTIPOLYGON (((-87.62880 41.80189, -87.62879 ...
3  GRAND BOULEVARD  ...  MULTIPOLYGON (((-87.60671 41.81681, -87.60670 ...
4          KENWOOD  ...  MULTIPOLYGON (((-87.59215 41.81693, -87.59215 ...

[5 rows x 9 columns]

>>> groceries.head()
   OBJECTID     Ycoord  ...  Category                           geometry
0        16  41.973266  ...       NaN  MULTIPOINT ((-87.65661 41.97321))
1        18  41.696367  ...       NaN  MULTIPOINT ((-87.68136 41.69713))
2        22  41.868634  ...       NaN  MULTIPOINT ((-87.63918 41.86847))
3        23  41.877590  ...       new  MULTIPOINT ((-87.65495 41.87783))
4        27  41.737696  ...       NaN  MULTIPOINT ((-87.62715 41.73623))
[5 rows x 8 columns]
>>> groceries_w_communities = groceries.sjoin(chicago)
>>> groceries_w_communities[["OBJECTID", "community", "geometry"]].head()
   OBJECTID       community                           geometry
0        16          UPTOWN  MULTIPOINT ((-87.65661 41.97321))
1        18     MORGAN PARK  MULTIPOINT ((-87.68136 41.69713))
2        22  NEAR WEST SIDE  MULTIPOINT ((-87.63918 41.86847))
3        23  NEAR WEST SIDE  MULTIPOINT ((-87.65495 41.87783))
4        27         CHATHAM  MULTIPOINT ((-87.62715 41.73623))

Notes

Every operation in GeoPandas is planar, i.e. the potential third dimension is not taken into account.

See also

GeoDataFrame.sjoin_nearest

nearest neighbor join

sjoin

equivalent top-level function

sjoin_nearest(right, how='inner', max_distance=None, lsuffix='left', rsuffix='right', distance_col=None, exclusive=False)[source]

Spatial join of two GeoDataFrames based on the distance between their geometries.

Results will include multiple output records for a single input record where there are multiple equidistant nearest or intersected neighbors.

See the User Guide page https://geopandas.readthedocs.io/en/latest/docs/user_guide/mergingdata.html for more details.

Parameters:
  • right (GeoDataFrame)

  • how (string, default 'inner') –

    The type of join:

    • ’left’: use keys from left_df; retain only left_df geometry column

    • ’right’: use keys from right_df; retain only right_df geometry column

    • ’inner’: use intersection of keys from both dfs; retain only left_df geometry column

  • max_distance (float, default None) – Maximum distance within which to query for nearest geometry. Must be greater than 0. The max_distance used to search for nearest items in the tree may have a significant impact on performance by reducing the number of input geometries that are evaluated for nearest items in the tree.

  • lsuffix (string, default 'left') – Suffix to apply to overlapping column names (left GeoDataFrame).

  • rsuffix (string, default 'right') – Suffix to apply to overlapping column names (right GeoDataFrame).

  • distance_col (string, default None) – If set, save the distances computed between matching geometries under a column of this name in the joined GeoDataFrame.

  • exclusive (bool, optional, default False) – If True, the nearest geometries that are equal to the input geometry will not be returned, default False.

Return type:

GeoDataFrame

Examples

>>> import geodatasets
>>> groceries = geopandas.read_file(
...     geodatasets.get_path("geoda.groceries")
... )
>>> chicago = geopandas.read_file(
...     geodatasets.get_path("geoda.chicago_health")
... ).to_crs(groceries.crs)
>>> chicago.head()
   ComAreaID  ...                                           geometry
0         35  ...  POLYGON ((-87.60914 41.84469, -87.60915 41.844...
1         36  ...  POLYGON ((-87.59215 41.81693, -87.59231 41.816...
2         37  ...  POLYGON ((-87.62880 41.80189, -87.62879 41.801...
3         38  ...  POLYGON ((-87.60671 41.81681, -87.60670 41.816...
4         39  ...  POLYGON ((-87.59215 41.81693, -87.59215 41.816...
[5 rows x 87 columns]
>>> groceries.head()
   OBJECTID     Ycoord  ...  Category                           geometry
0        16  41.973266  ...       NaN  MULTIPOINT ((-87.65661 41.97321))
1        18  41.696367  ...       NaN  MULTIPOINT ((-87.68136 41.69713))
2        22  41.868634  ...       NaN  MULTIPOINT ((-87.63918 41.86847))
3        23  41.877590  ...       new  MULTIPOINT ((-87.65495 41.87783))
4        27  41.737696  ...       NaN  MULTIPOINT ((-87.62715 41.73623))
[5 rows x 8 columns]
>>> groceries_w_communities = groceries.sjoin_nearest(chicago)
>>> groceries_w_communities[["Chain", "community", "geometry"]].head(2)
               Chain    community                                geometry
0     VIET HOA PLAZA       UPTOWN   MULTIPOINT ((1168268.672 1933554.35))
1  COUNTY FAIR FOODS  MORGAN PARK  MULTIPOINT ((1162302.618 1832900.224))

To include the distances:

>>> groceries_w_communities = groceries.sjoin_nearest(chicago, distance_col="distances")
>>> groceries_w_communities[["Chain", "community", "distances"]].head(2)
               Chain    community  distances
0     VIET HOA PLAZA       UPTOWN        0.0
1  COUNTY FAIR FOODS  MORGAN PARK        0.0

In the following example, we get multiple groceries for Uptown because all results are equidistant (in this case zero because they intersect). In fact, we get 4 results in total:

>>> chicago_w_groceries = groceries.sjoin_nearest(chicago, distance_col="distances", how="right")
>>> uptown_results = chicago_w_groceries[chicago_w_groceries["community"] == "UPTOWN"]
>>> uptown_results[["Chain", "community"]]
            Chain community
30  VIET HOA PLAZA    UPTOWN
30      JEWEL OSCO    UPTOWN
30          TARGET    UPTOWN
30       Mariano's    UPTOWN

See also

GeoDataFrame.sjoin

binary predicate joins

sjoin_nearest

equivalent top-level function

Notes

Since this join relies on distances, results will be inaccurate if your geometries are in a geographic CRS.

Every operation in GeoPandas is planar, i.e. the potential third dimension is not taken into account.

skew(xs=0.0, ys=0.0, origin='center', use_radians=False)

Return a GeoSeries with skewed geometries.

The geometries are sheared by angles along the x and y dimensions.

See http://shapely.readthedocs.io/en/latest/manual.html#shapely.affinity.skew for details.

Parameters:
  • xs (float, float) – The shear angle(s) for the x and y axes respectively. These can be specified in either degrees (default) or radians by setting use_radians=True.

  • ys (float, float) – The shear angle(s) for the x and y axes respectively. These can be specified in either degrees (default) or radians by setting use_radians=True.

  • origin (string, Point, or tuple (x, y)) – The point of origin can be a keyword ‘center’ for the bounding box center (default), ‘centroid’ for the geometry’s centroid, a Point object or a coordinate tuple (x, y).

  • use_radians (boolean) – Whether to interpret the shear angle(s) as degrees or radians

Examples

>>> from shapely.geometry import Point, LineString, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Point(1, 1),
...         LineString([(1, -1), (1, 0)]),
...         Polygon([(3, -1), (4, 0), (3, 1)]),
...     ]
... )
>>> s
0                         POINT (1 1)
1              LINESTRING (1 -1, 1 0)
2    POLYGON ((3 -1, 4 0, 3 1, 3 -1))
dtype: geometry
>>> s.skew(45, 30)
0                                          POINT (1 1)
1                           LINESTRING (0.5 -1, 1.5 0)
2    POLYGON ((2 -1.28868, 4 0.28868, 4 0.71132, 2 ...
dtype: geometry
>>> s.skew(45, 30, origin=(0, 0))
0                                    POINT (2 1.57735)
1         LINESTRING (1.11022e-16 -0.42265, 1 0.57735)
2    POLYGON ((2 0.73205, 4 2.3094, 4 2.73205, 2 0....
dtype: geometry
snap(other, tolerance, align=None)

Snap the vertices and segments of the geometry to vertices of the reference.

Vertices and segments of the input geometry are snapped to vertices of the reference geometry, returning a new geometry; the input geometries are not modified. The result geometry is the input geometry with the vertices and segments snapped. If no snapping occurs then the input geometry is returned unchanged. The tolerance is used to control where snapping is performed.

Where possible, this operation tries to avoid creating invalid geometries; however, it does not guarantee that output geometries will be valid. It is the responsibility of the caller to check for and handle invalid geometries.

Because too much snapping can result in invalid geometries being created, heuristics are used to determine the number and location of snapped vertices that are likely safe to snap. These heuristics may omit some potential snaps that are otherwise within the tolerance.

The operation works in a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The Geoseries (elementwise) or geometric object to snap to.

  • tolerance (float or array like) – Maximum distance between vertices that shall be snapped

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

GeoSeries

Examples

>>> from shapely import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Point(0.5, 2.5),
...         LineString([(0.1, 0.1), (0.49, 0.51), (1.01, 0.89)]),
...         Polygon([(0, 0), (0, 10), (10, 10), (10, 0), (0, 0)]),
...     ],
... )
>>> s
0                               POINT (0.5 2.5)
1    LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89)
2       POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))
dtype: geometry
>>> s2 = geopandas.GeoSeries(
...     [
...         Point(0, 2),
...         LineString([(0, 0), (0.5, 0.5), (1.0, 1.0)]),
...         Point(8, 10),
...     ],
...     index=range(1, 4),
... )
>>> s2
1                       POINT (0 2)
2    LINESTRING (0 0, 0.5 0.5, 1 1)
3                      POINT (8 10)
dtype: geometry

We can snap each geometry to a single shapely geometry:

../../../_static/binary_op-03.svg
>>> s.snap(Point(0, 2), tolerance=1)
0                                     POINT (0 2)
1      LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89)
2    POLYGON ((0 0, 0 2, 0 10, 10 10, 10 0, 0 0))
dtype: geometry

We can also snap two GeoSeries to each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and snap elements with the same index using align=True or ignore index and snap elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.snap(s2, tolerance=1, align=True)
0                                                 None
1           LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89)
2    POLYGON ((0.5 0.5, 1 1, 0 10, 10 10, 10 0, 0.5...
3                                                 None
dtype: geometry
>>> s.snap(s2, tolerance=1, align=False)
0                                      POINT (0 2)
1                   LINESTRING (0 0, 0.5 0.5, 1 1)
2    POLYGON ((0 0, 0 10, 8 10, 10 10, 10 0, 0 0))
dtype: geometry
sort_index(*, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)

Sort object by labels (along an axis).

Returns a new DataFrame sorted by label if inplace argument is False, otherwise updates the original DataFrame and returns None.

Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.

  • level (int or level name or list of ints or list of level names) – If not None, sort on values in specified index level(s).

  • ascending (bool or list-like of bools, default True) – Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually.

  • inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.

  • kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort') – Choice of sorting algorithm. See also numpy.sort() for more information. mergesort and stable are the only stable algorithms. For DataFrames, this option is only applied when sorting on a single column or label.

  • na_position ({'first', 'last'}, default 'last') – Puts NaNs at the beginning if first; last puts NaNs at the end. Not implemented for MultiIndex.

  • sort_remaining (bool, default True) – If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.

  • ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.

  • key (callable, optional) – If not None, apply the key function to the index values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect an Index and return an Index of the same shape. For MultiIndex inputs, the key is applied per level.

Returns:

The original DataFrame sorted by the labels or None if inplace=True.

Return type:

DataFrame or None

See also

Series.sort_index

Sort Series by the index.

DataFrame.sort_values

Sort DataFrame by the value.

Series.sort_values

Sort Series by the value.

Examples

>>> df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150],
...                   columns=['A'])
>>> df.sort_index()
     A
1    4
29   2
100  1
150  5
234  3

By default, it sorts in ascending order, to sort in descending order, use ascending=False

>>> df.sort_index(ascending=False)
     A
234  3
150  5
100  1
29   2
1    4

A key function can be specified which is applied to the index before sorting. For a MultiIndex this is applied to each level separately.

>>> df = pd.DataFrame({"a": [1, 2, 3, 4]}, index=['A', 'b', 'C', 'd'])
>>> df.sort_index(key=lambda x: x.str.lower())
   a
A  1
b  2
C  3
d  4
sort_values(by, *, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)

Sort by the values along either axis.

Parameters:
  • by (str or list of str) –

    Name or list of names to sort by.

    • if axis is 0 or ‘index’ then by may contain index levels and/or column labels.

    • if axis is 1 or ‘columns’ then by may contain column levels and/or index labels.

  • axis ("{0 or 'index', 1 or 'columns'}", default 0) – Axis to be sorted.

  • ascending (bool or list of bool, default True) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.

  • inplace (bool, default False) – If True, perform operation in-place.

  • kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort') – Choice of sorting algorithm. See also numpy.sort() for more information. mergesort and stable are the only stable algorithms. For DataFrames, this option is only applied when sorting on a single column or label.

  • na_position ({'first', 'last'}, default 'last') – Puts NaNs at the beginning if first; last puts NaNs at the end.

  • ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.

  • key (callable, optional) – Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.

Returns:

DataFrame with sorted values or None if inplace=True.

Return type:

DataFrame or None

See also

DataFrame.sort_index

Sort a DataFrame by the index.

Series.sort_values

Similar method for a Series.

Examples

>>> df = pd.DataFrame({
...     'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],
...     'col2': [2, 1, 9, 8, 7, 4],
...     'col3': [0, 1, 9, 4, 2, 3],
...     'col4': ['a', 'B', 'c', 'D', 'e', 'F']
... })
>>> df
  col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F

Sort by col1

>>> df.sort_values(by=['col1'])
  col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
5    C     4     3    F
4    D     7     2    e
3  NaN     8     4    D

Sort by multiple columns

>>> df.sort_values(by=['col1', 'col2'])
  col1  col2  col3 col4
1    A     1     1    B
0    A     2     0    a
2    B     9     9    c
5    C     4     3    F
4    D     7     2    e
3  NaN     8     4    D

Sort Descending

>>> df.sort_values(by='col1', ascending=False)
  col1  col2  col3 col4
4    D     7     2    e
5    C     4     3    F
2    B     9     9    c
0    A     2     0    a
1    A     1     1    B
3  NaN     8     4    D

Putting NAs first

>>> df.sort_values(by='col1', ascending=False, na_position='first')
  col1  col2  col3 col4
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F
2    B     9     9    c
0    A     2     0    a
1    A     1     1    B

Sorting with a key function

>>> df.sort_values(by='col4', key=lambda col: col.str.lower())
   col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F

Natural sort with the key argument, using the natsort <https://github.com/SethMMorton/natsort> package.

>>> df = pd.DataFrame({
...    "time": ['0hr', '128hr', '72hr', '48hr', '96hr'],
...    "value": [10, 20, 30, 40, 50]
... })
>>> df
    time  value
0    0hr     10
1  128hr     20
2   72hr     30
3   48hr     40
4   96hr     50
>>> from natsort import index_natsorted
>>> df.sort_values(
...     by="time",
...     key=lambda x: np.argsort(index_natsorted(df["time"]))
... )
    time  value
0    0hr     10
3   48hr     40
2   72hr     30
4   96hr     50
1  128hr     20
sparse

alias of SparseFrameAccessor

squeeze(axis=None)

Squeeze 1 dimensional axis objects into scalars.

Series or DataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is unchanged.

This method is most useful when you don’t know if your object is a Series or DataFrame, but you do know it has just a single column. In that case you can safely call squeeze to ensure you have a Series.

Parameters:

axis ({0 or 'index', 1 or 'columns', None}, default None) – A specific axis to squeeze. By default, all length-1 axes are squeezed. For Series this parameter is unused and defaults to None.

Returns:

The projection after squeezing axis or all the axes.

Return type:

DataFrame, Series, or scalar

See also

Series.iloc

Integer-location based indexing for selecting scalars.

DataFrame.iloc

Integer-location based indexing for selecting Series.

Series.to_frame

Inverse of DataFrame.squeeze for a single-column DataFrame.

Examples

>>> primes = pd.Series([2, 3, 5, 7])

Slicing might produce a Series with a single value:

>>> even_primes = primes[primes % 2 == 0]
>>> even_primes
0    2
dtype: int64
>>> even_primes.squeeze()
2

Squeezing objects with more than one value in every axis does nothing:

>>> odd_primes = primes[primes % 2 == 1]
>>> odd_primes
1    3
2    5
3    7
dtype: int64
>>> odd_primes.squeeze()
1    3
2    5
3    7
dtype: int64

Squeezing is even more effective when used with DataFrames.

>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
>>> df
   a  b
0  1  2
1  3  4

Slicing a single column will produce a DataFrame with the columns having only one value:

>>> df_a = df[['a']]
>>> df_a
   a
0  1
1  3

So the columns can be squeezed down, resulting in a Series:

>>> df_a.squeeze('columns')
0    1
1    3
Name: a, dtype: int64

Slicing a single row from a single column will produce a single scalar DataFrame:

>>> df_0a = df.loc[df.index < 1, ['a']]
>>> df_0a
   a
0  1

Squeezing the rows produces a single scalar Series:

>>> df_0a.squeeze('rows')
a    1
Name: 0, dtype: int64

Squeezing all axes will project directly into a scalar:

>>> df_0a.squeeze()
1
stack(level=-1, dropna=<no_default>, sort=<no_default>, future_stack=False)

Stack the prescribed level(s) from columns to index.

Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:

  • if the columns have a single level, the output is a Series;

  • if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.

Parameters:
  • level (int, str, list, default -1) – Level(s) to stack from the column axis onto the index axis, defined as one index or label, or a list of indices or labels.

  • dropna (bool, default True) – Whether to drop rows in the resulting Frame/Series with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing from the original dataframe. See Examples section.

  • sort (bool, default True) – Whether to sort the levels of the resulting MultiIndex.

  • future_stack (bool, default False) – Whether to use the new implementation that will replace the current implementation in pandas 3.0. When True, dropna and sort have no impact on the result and must remain unspecified. See pandas 2.1.0 Release notes for more details.

Returns:

Stacked dataframe or series.

Return type:

DataFrame or Series

See also

DataFrame.unstack

Unstack prescribed level(s) from index axis onto column axis.

DataFrame.pivot

Reshape dataframe from long format to wide format.

DataFrame.pivot_table

Create a spreadsheet-style pivot table as a DataFrame.

Notes

The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).

Reference the user guide for more examples.

Examples

Single level columns

>>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
...                                     index=['cat', 'dog'],
...                                     columns=['weight', 'height'])

Stacking a dataframe with a single level column axis returns a Series:

>>> df_single_level_cols
     weight height
cat       0      1
dog       2      3
>>> df_single_level_cols.stack(future_stack=True)
cat  weight    0
     height    1
dog  weight    2
     height    3
dtype: int64

Multi level columns: simple case

>>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
...                                        ('weight', 'pounds')])
>>> df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol1)

Stacking a dataframe with a multi-level column axis:

>>> df_multi_level_cols1
     weight
         kg    pounds
cat       1        2
dog       2        4
>>> df_multi_level_cols1.stack(future_stack=True)
            weight
cat kg           1
    pounds       2
dog kg           2
    pounds       4

Missing values

>>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
...                                        ('height', 'm')])
>>> df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol2)

It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs:

>>> df_multi_level_cols2
    weight height
        kg      m
cat    1.0    2.0
dog    3.0    4.0
>>> df_multi_level_cols2.stack(future_stack=True)
        weight  height
cat kg     1.0     NaN
    m      NaN     2.0
dog kg     3.0     NaN
    m      NaN     4.0

Prescribing the level(s) to be stacked

The first parameter controls which level or levels are stacked:

>>> df_multi_level_cols2.stack(0, future_stack=True)
             kg    m
cat weight  1.0  NaN
    height  NaN  2.0
dog weight  3.0  NaN
    height  NaN  4.0
>>> df_multi_level_cols2.stack([0, 1], future_stack=True)
cat  weight  kg    1.0
     height  m     2.0
dog  weight  kg    3.0
     height  m     4.0
dtype: float64
std(axis=0, skipna=True, ddof=1, numeric_only=False, **kwargs)

Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:
  • axis ({index (0), columns (1)}) –

    For Series this parameter is unused and defaults to 0.

    Warning

    The behavior of DataFrame.std with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

Return type:

Series or DataFrame (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

Examples

>>> df = pd.DataFrame({'person_id': [0, 1, 2, 3],
...                    'age': [21, 25, 62, 43],
...                    'height': [1.61, 1.87, 1.49, 2.01]}
...                   ).set_index('person_id')
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01

The standard deviation of the columns can be found as follows:

>>> df.std()
age       18.786076
height     0.237417
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.std(ddof=0)
age       16.269219
height     0.205609
dtype: float64
property style: Styler

Returns a Styler object.

Contains methods for building a styled HTML representation of the DataFrame.

See also

io.formats.style.Styler

Helps style a DataFrame or Series according to the data with HTML and CSS.

Examples

>>> df = pd.DataFrame({'A': [1, 2, 3]})
>>> df.style

Please see Table Visualization for more examples.

sub(other, axis='columns', level=None, fill_value=None)

Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
subtract(other, axis='columns', level=None, fill_value=None)

Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
sum(axis=0, skipna=True, numeric_only=False, min_count=0, **kwargs)

Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum.

Parameters:
  • axis ({index (0), columns (1)}) –

    Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

    Warning

    The behavior of DataFrame.sum with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

    Added in version 2.0.0.

  • skipna (bool, default True) – Exclude NA/null values when computing the result.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

  • min_count (int, default 0) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

  • **kwargs – Additional keyword arguments to be passed to the function.

Return type:

Series or scalar

See also

Series.sum

Return the sum.

Series.min

Return the minimum.

Series.max

Return the maximum.

Series.idxmin

Return the index of the minimum.

Series.idxmax

Return the index of the maximum.

DataFrame.sum

Return the sum over the requested axis.

DataFrame.min

Return the minimum over the requested axis.

DataFrame.max

Return the maximum over the requested axis.

DataFrame.idxmin

Return the index of the minimum over the requested axis.

DataFrame.idxmax

Return the index of the maximum over the requested axis.

Examples

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64
>>> s.sum()
14

By default, the sum of an empty or all-NA Series is 0.

>>> pd.Series([], dtype="float64").sum()  # min_count=0 is the default
0.0

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.

>>> pd.Series([], dtype="float64").sum(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).sum()
0.0
>>> pd.Series([np.nan]).sum(min_count=1)
nan
swapaxes(axis1, axis2, copy=None)

Interchange axes and swap values axes appropriately.

Deprecated since version 2.1.0: swapaxes is deprecated and will be removed. Please use transpose instead.

Return type:

same as input

Parameters:
  • axis1 (int | Literal['index', 'columns', 'rows'])

  • axis2 (int | Literal['index', 'columns', 'rows'])

  • copy (bool | None)

Examples

Please see examples for DataFrame.transpose().

swaplevel(i=-2, j=-1, axis=0)

Swap levels i and j in a MultiIndex.

Default is to swap the two innermost levels of the index.

Parameters:
  • i (int or str) – Levels of the indices to be swapped. Can pass level name as string.

  • j (int or str) – Levels of the indices to be swapped. Can pass level name as string.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to swap levels on. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

Returns:

DataFrame with levels swapped in MultiIndex.

Return type:

DataFrame

Examples

>>> df = pd.DataFrame(
...     {"Grade": ["A", "B", "A", "C"]},
...     index=[
...         ["Final exam", "Final exam", "Coursework", "Coursework"],
...         ["History", "Geography", "History", "Geography"],
...         ["January", "February", "March", "April"],
...     ],
... )
>>> df
                                    Grade
Final exam  History     January      A
            Geography   February     B
Coursework  History     March        A
            Geography   April        C

In the following example, we will swap the levels of the indices. Here, we will swap the levels column-wise, but levels can be swapped row-wise in a similar manner. Note that column-wise is the default behaviour. By not supplying any arguments for i and j, we swap the last and second to last indices.

>>> df.swaplevel()
                                    Grade
Final exam  January     History         A
            February    Geography       B
Coursework  March       History         A
            April       Geography       C

By supplying one argument, we can choose which index to swap the last index with. We can for example swap the first index with the last one as follows.

>>> df.swaplevel(0)
                                    Grade
January     History     Final exam      A
February    Geography   Final exam      B
March       History     Coursework      A
April       Geography   Coursework      C

We can also define explicitly which indices we want to swap by supplying values for both i and j. Here, we for example swap the first and second indices.

>>> df.swaplevel(0, 1)
                                    Grade
History     Final exam  January         A
Geography   Final exam  February        B
History     Coursework  March           A
Geography   Coursework  April           C
symmetric_difference(other, align=None)

Return a GeoSeries of the symmetric difference of points in each aligned geometry with other.

For each geometry, the symmetric difference consists of points in the geometry not in other, and points in other not in the geometry.

../../../_static/binary_geo-symm_diff.svg

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the symmetric difference to.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 6),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

We can do symmetric difference of each geometry and a single shapely geometry:

../../../_static/binary_op-03.svg
>>> s.symmetric_difference(Polygon([(0, 0), (1, 1), (0, 1)]))
0                  POLYGON ((0 2, 2 2, 1 1, 0 1, 0 2))
1                  POLYGON ((0 2, 2 2, 1 1, 0 1, 0 2))
2    GEOMETRYCOLLECTION (POLYGON ((0 0, 0 1, 1 1, 0...
3    GEOMETRYCOLLECTION (POLYGON ((0 0, 0 1, 1 1, 0...
4                       POLYGON ((0 1, 1 1, 0 0, 0 1))
dtype: geometry

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.symmetric_difference(s2, align=True)
0                                                 None
1                  POLYGON ((0 2, 2 2, 1 1, 0 1, 0 2))
2    MULTILINESTRING ((0 0, 1 1), (1 1, 2 2), (1 0,...
3                                     LINESTRING EMPTY
4                            MULTIPOINT ((0 1), (1 1))
5                                                 None
dtype: geometry
>>> s.symmetric_difference(s2, align=False)
0                  POLYGON ((0 2, 2 2, 1 1, 0 1, 0 2))
1    GEOMETRYCOLLECTION (POLYGON ((0 0, 0 2, 1 2, 2...
2    MULTILINESTRING ((0 0, 1 1), (1 1, 2 2), (2 0,...
3                                LINESTRING (2 0, 0 2)
4                                          POINT EMPTY
dtype: geometry

See also

GeoSeries.difference, GeoSeries.union, GeoSeries.intersection

tail(n=5)

Return the last n rows.

This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.

For negative values of n, this function returns all rows except the first |n| rows, equivalent to df[|n|:].

If n is larger than the number of rows, this function returns all rows.

Parameters:

n (int, default 5) – Number of rows to select.

Returns:

The last n rows of the caller object.

Return type:

type of caller

See also

DataFrame.head

The first n rows of the caller object.

Examples

>>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the last 5 lines

>>> df.tail()
   animal
4  monkey
5  parrot
6   shark
7   whale
8   zebra

Viewing the last n lines (three in this case)

>>> df.tail(3)
  animal
6  shark
7  whale
8  zebra

For negative values of n

>>> df.tail(-3)
   animal
3    lion
4  monkey
5  parrot
6   shark
7   whale
8   zebra
take(indices, axis=0, **kwargs)

Return the elements in the given positional indices along an axis.

This means that we are not indexing according to actual values in the index attribute of the object. We are indexing according to the actual position of the element in the object.

Parameters:
  • indices (array-like) – An array of ints indicating which positions to take.

  • axis ({0 or 'index', 1 or 'columns', None}, default 0) – The axis on which to select elements. 0 means that we are selecting rows, 1 means that we are selecting columns. For Series this parameter is unused and defaults to 0.

  • **kwargs – For compatibility with numpy.take(). Has no effect on the output.

Returns:

An array-like containing the elements taken from the object.

Return type:

same type as caller

See also

DataFrame.loc

Select a subset of a DataFrame by labels.

DataFrame.iloc

Select a subset of a DataFrame by positions.

numpy.take

Take elements from an array along an axis.

Examples

>>> df = pd.DataFrame([('falcon', 'bird', 389.0),
...                    ('parrot', 'bird', 24.0),
...                    ('lion', 'mammal', 80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=['name', 'class', 'max_speed'],
...                   index=[0, 2, 3, 1])
>>> df
     name   class  max_speed
0  falcon    bird      389.0
2  parrot    bird       24.0
3    lion  mammal       80.5
1  monkey  mammal        NaN

Take elements at positions 0 and 3 along the axis 0 (default).

Note how the actual indices selected (0 and 1) do not correspond to our selected indices 0 and 3. That’s because we are selecting the 0th and 3rd rows, not rows whose indices equal 0 and 3.

>>> df.take([0, 3])
     name   class  max_speed
0  falcon    bird      389.0
1  monkey  mammal        NaN

Take elements at indices 1 and 2 along the axis 1 (column selection).

>>> df.take([1, 2], axis=1)
    class  max_speed
0    bird      389.0
2    bird       24.0
3  mammal       80.5
1  mammal        NaN

We may take elements using negative integers for positive indices, starting from the end of the object, just like with Python lists.

>>> df.take([-1, -2])
     name   class  max_speed
1  monkey  mammal        NaN
3    lion  mammal       80.5
to_arrow(*, index=None, geometry_encoding='WKB', interleaved=True, include_z=None)[source]

Encode a GeoDataFrame to GeoArrow format.

See https://geoarrow.org/ for details on the GeoArrow specification.

This function returns a generic Arrow data object implementing the Arrow PyCapsule Protocol (i.e. having an __arrow_c_stream__ method). This object can then be consumed by your Arrow implementation of choice that supports this protocol.

Added in version 1.0.

Parameters:
  • index (bool, default None) – If True, always include the dataframe’s index(es) as columns in the file output. If False, the index(es) will not be written to the file. If None, the index(ex) will be included as columns in the file output except RangeIndex which is stored as metadata only.

  • geometry_encoding ({'WKB', 'geoarrow'}, default 'WKB') – The GeoArrow encoding to use for the data conversion.

  • interleaved (bool, default True) – Only relevant for ‘geoarrow’ encoding. If True, the geometries’ coordinates are interleaved in a single fixed size list array. If False, the coordinates are stored as separate arrays in a struct type.

  • include_z (bool, default None) – Only relevant for ‘geoarrow’ encoding (for WKB, the dimensionality of the individial geometries is preserved). If False, return 2D geometries. If True, include the third dimension in the output (if a geometry has no third dimension, the z-coordinates will be NaN). By default, will infer the dimensionality from the input geometries. Note that this inference can be unreliable with empty geometries (for a guaranteed result, it is recommended to specify the keyword).

Returns:

A generic Arrow table object with geometry columns encoded to GeoArrow.

Return type:

ArrowTable

Examples

>>> from shapely.geometry import Point
>>> data = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = geopandas.GeoDataFrame(data)
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)
>>> arrow_table = gdf.to_arrow()
>>> arrow_table
<geopandas.io._geoarrow.ArrowTable object at ...>

The returned data object needs to be consumed by a library implementing the Arrow PyCapsule Protocol. For example, wrapping the data as a pyarrow.Table (requires pyarrow >= 14.0):

>>> import pyarrow as pa
>>> table = pa.table(arrow_table)
>>> table
pyarrow.Table
col1: string
geometry: binary
----
col1: [["name1","name2"]]
geometry: [[0101000000000000000000F03F0000000000000040,01010000000000000000000040000000000000F03F]]
to_clipboard(excel=True, sep=None, **kwargs)

Copy object to the system clipboard.

Write a text representation of object to the system clipboard. This can be pasted into Excel, for example.

Parameters:
  • excel (bool, default True) –

    Produce output in a csv format for easy pasting into excel.

    • True, use the provided separator for csv pasting.

    • False, write a string representation of the object to the clipboard.

  • sep (str, default :py:class:``’t’:py:class:``) – Field delimiter.

  • **kwargs – These parameters will be passed to DataFrame.to_csv.

Return type:

None

See also

DataFrame.to_csv

Write a DataFrame to a comma-separated values (csv) file.

read_clipboard

Read text from clipboard and pass to read_csv.

Notes

Requirements for your platform.

  • Linux : xclip, or xsel (with PyQt4 modules)

  • Windows : none

  • macOS : none

This method uses the processes developed for the package pyperclip. A solution to render any output string format is given in the examples.

Examples

Copy the contents of a DataFrame to the clipboard.

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
>>> df.to_clipboard(sep=',')
... # Wrote the following to the system clipboard:
... # ,A,B,C
... # 0,1,2,3
... # 1,4,5,6

We can omit the index by passing the keyword index and setting it to false.

>>> df.to_clipboard(sep=',', index=False)
... # Wrote the following to the system clipboard:
... # A,B,C
... # 1,2,3
... # 4,5,6

Using the original pyperclip package for any string output format.

import pyperclip
html = df.style.to_html()
pyperclip.copy(html)
to_crs(crs=None, epsg=None, inplace=False)[source]

Transform geometries to a new coordinate reference system.

Transform all geometries in an active geometry column to a different coordinate reference system. The crs attribute on the current GeoSeries must be set. Either crs or epsg may be specified for output.

This method will transform all points in all objects. It has no notion of projecting entire geometries. All segments joining points are assumed to be lines in the current projection, not geodesics. Objects crossing the dateline (or other projection boundary) will have undesirable behavior.

Parameters:
  • crs (pyproj.CRS, optional if `epsg is specified`) – The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • epsg (int, optional if `crs is specified`) – EPSG code specifying output projection.

  • inplace (bool, optional, default: False) – Whether to return a new GeoDataFrame or do the transformation in place.

Return type:

GeoDataFrame

Examples

>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = geopandas.GeoDataFrame(d, crs=4326)
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)
>>> gdf.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
>>> gdf = gdf.to_crs(3857)
>>> gdf
    col1                       geometry
0  name1  POINT (111319.491 222684.209)
1  name2  POINT (222638.982 111325.143)
>>> gdf.crs
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World - 85°S to 85°N
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

See also

GeoDataFrame.set_crs

assign CRS without re-projection

to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', lineterminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', errors='strict', storage_options=None)

Write object to a comma-separated values (csv) file.

Parameters:
  • path_or_buf (str, path object, file-like object, or None, default None) – String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.

  • sep (str, default ',') – String of length 1. Field delimiter for the output file.

  • na_rep (str, default '') – Missing data representation.

  • float_format (str, Callable, default None) – Format string for floating point numbers. If a Callable is given, it takes precedence over other numeric formatting parameters, like decimal.

  • columns (sequence, optional) – Columns to write.

  • header (bool or list of str, default True) – Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.

  • index (bool, default True) – Write row names (index).

  • index_label (str or sequence, or False, default None) – Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.

  • mode ({'w', 'x', 'a'}, default 'w') –

    Forwarded to either open(mode=) or fsspec.open(mode=) to control the file opening. Typical values include:

    • ’w’, truncate the file first.

    • ’x’, exclusive creation, failing if the file already exists.

    • ’a’, append to the end of file if it exists.

  • encoding (str, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.

  • compression (str or dict, default 'infer') –

    For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.

    Added in version 1.5.0: Added support for .tar files.

    May be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.

    Passing compression options as keys in dict is supported for compression modes ‘gzip’, ‘bz2’, ‘zstd’, and ‘zip’.

  • quoting (optional constant from csv module) – Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.

  • quotechar (str, default '\"') – String of length 1. Character used to quote fields.

  • lineterminator (str, optional) –

    The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (’\n’ for linux, ‘\r\n’ for Windows, i.e.).

    Changed in version 1.5.0: Previously was line_terminator, changed for consistency with read_csv and the standard library ‘csv’ module.

  • chunksize (int or None) – Rows to write at a time.

  • date_format (str, default None) – Format string for datetime objects.

  • doublequote (bool, default True) – Control quoting of quotechar inside a field.

  • escapechar (str, default None) – String of length 1. Character used to escape sep and quotechar when appropriate.

  • decimal (str, default '.') – Character recognized as decimal separator. E.g. use ‘,’ for European data.

  • errors (str, default 'strict') – Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

  • storage_options (dict, optional) – Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

Returns:

If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.

Return type:

None or str

See also

read_csv

Load a CSV file into a DataFrame.

to_excel

Write DataFrame to an Excel file.

Examples

Create ‘out.csv’ containing ‘df’ without indices

>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
...                    'mask': ['red', 'purple'],
...                    'weapon': ['sai', 'bo staff']})
>>> df.to_csv('out.csv', index=False)

Create ‘out.zip’ containing ‘out.csv’

>>> df.to_csv(index=False)
'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
>>> compression_opts = dict(method='zip',
...                         archive_name='out.csv')
>>> df.to_csv('out.zip', index=False,
...           compression=compression_opts)

To write a csv file to a new folder or nested folder you will first need to create it using either Pathlib or os:

>>> from pathlib import Path
>>> filepath = Path('folder/subfolder/out.csv')
>>> filepath.parent.mkdir(parents=True, exist_ok=True)
>>> df.to_csv(filepath)
>>> import os
>>> os.makedirs('folder/subfolder', exist_ok=True)
>>> df.to_csv('folder/subfolder/out.csv')
to_dict(orient='dict', into=<class 'dict'>, index=True)

Convert the DataFrame to a dictionary.

The type of the key-value pairs can be customized with the parameters (see below).

Parameters:
  • orient (str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}) –

    Determines the type of the values of the dictionary.

    • ’dict’ (default) : dict like {column -> {index -> value}}

    • ’list’ : dict like {column -> [values]}

    • ’series’ : dict like {column -> Series(values)}

    • ’split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}

    • ’tight’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values], ‘index_names’ -> [index.names], ‘column_names’ -> [column.names]}

    • ’records’ : list like [{column -> value}, … , {column -> value}]

    • ’index’ : dict like {index -> {column -> value}}

    Added in version 1.4.0: ‘tight’ as an allowed value for the orient argument

  • into (class, default dict) – The collections.abc.MutableMapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.

  • index (bool, default True) –

    Whether to include the index item (and index_names item if orient is ‘tight’) in the returned dictionary. Can only be False when orient is ‘split’ or ‘tight’.

    Added in version 2.0.0.

Returns:

Return a collections.abc.MutableMapping object representing the DataFrame. The resulting transformation depends on the orient parameter.

Return type:

dict, list or collections.abc.MutableMapping

See also

DataFrame.from_dict

Create a DataFrame from a dictionary.

DataFrame.to_json

Convert a DataFrame to JSON format.

Examples

>>> df = pd.DataFrame({'col1': [1, 2],
...                    'col2': [0.5, 0.75]},
...                   index=['row1', 'row2'])
>>> df
      col1  col2
row1     1  0.50
row2     2  0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}

You can specify the return orientation.

>>> df.to_dict('series')
{'col1': row1    1
         row2    2
Name: col1, dtype: int64,
'col2': row1    0.50
        row2    0.75
Name: col2, dtype: float64}
>>> df.to_dict('split')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
 'data': [[1, 0.5], [2, 0.75]]}
>>> df.to_dict('records')
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]
>>> df.to_dict('index')
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}
>>> df.to_dict('tight')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}

You can also specify the mapping type.

>>> from collections import OrderedDict, defaultdict
>>> df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
             ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])

If you want a defaultdict, you need to initialize it:

>>> dd = defaultdict(list)
>>> df.to_dict('records', into=dd)
[defaultdict(<class 'list'>, {'col1': 1, 'col2': 0.5}),
 defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]
to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, inf_rep='inf', freeze_panes=None, storage_options=None, engine_kwargs=None)

Write object to an Excel sheet.

To write a single object to an Excel .xlsx file it is only necessary to specify a target file name. To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to.

Multiple sheets may be written to by specifying unique sheet_name. With all data written to the file it is necessary to save the changes. Note that creating an ExcelWriter object with a file name that already exists will result in the contents of the existing file being erased.

Parameters:
  • excel_writer (path-like, file-like, or ExcelWriter object) – File path or existing ExcelWriter.

  • sheet_name (str, default 'Sheet1') – Name of sheet which will contain DataFrame.

  • na_rep (str, default '') – Missing data representation.

  • float_format (str, optional) – Format string for floating point numbers. For example float_format="%.2f" will format 0.1234 to 0.12.

  • columns (sequence or list of str, optional) – Columns to write.

  • header (bool or list of str, default True) – Write out the column names. If a list of string is given it is assumed to be aliases for the column names.

  • index (bool, default True) – Write row names (index).

  • index_label (str or sequence, optional) – Column label for index column(s) if desired. If not specified, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

  • startrow (int, default 0) – Upper left cell row to dump data frame.

  • startcol (int, default 0) – Upper left cell column to dump data frame.

  • engine (str, optional) – Write engine to use, ‘openpyxl’ or ‘xlsxwriter’. You can also set this via the options io.excel.xlsx.writer or io.excel.xlsm.writer.

  • merge_cells (bool, default True) – Write MultiIndex and Hierarchical Rows as merged cells.

  • inf_rep (str, default 'inf') – Representation for infinity (there is no native representation for infinity in Excel).

  • freeze_panes (tuple of int (length 2), optional) – Specifies the one-based bottommost row and rightmost column that is to be frozen.

  • storage_options (dict, optional) –

    Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

    Added in version 1.2.0.

  • engine_kwargs (dict, optional) – Arbitrary keyword arguments passed to excel engine.

Return type:

None

See also

to_csv

Write DataFrame to a comma-separated values (csv) file.

ExcelWriter

Class for writing DataFrame objects into excel sheets.

read_excel

Read an Excel file into a pandas DataFrame.

read_csv

Read a comma-separated values (csv) file into DataFrame.

io.formats.style.Styler.to_excel

Add styles to Excel sheet.

Notes

For compatibility with to_csv(), to_excel serializes lists and dicts to strings before writing.

Once a workbook has been saved it is not possible to write further data without rewriting the whole workbook.

Examples

Create, write to and save a workbook:

>>> df1 = pd.DataFrame([['a', 'b'], ['c', 'd']],
...                    index=['row 1', 'row 2'],
...                    columns=['col 1', 'col 2'])
>>> df1.to_excel("output.xlsx")

To specify the sheet name:

>>> df1.to_excel("output.xlsx",
...              sheet_name='Sheet_name_1')

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:

>>> df2 = df1.copy()
>>> with pd.ExcelWriter('output.xlsx') as writer:
...     df1.to_excel(writer, sheet_name='Sheet_name_1')
...     df2.to_excel(writer, sheet_name='Sheet_name_2')

ExcelWriter can also be used to append to an existing Excel file:

>>> with pd.ExcelWriter('output.xlsx',
...                     mode='a') as writer:
...     df1.to_excel(writer, sheet_name='Sheet_name_3')

To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):

>>> df1.to_excel('output1.xlsx', engine='xlsxwriter')
to_feather(path, index=None, compression=None, schema_version=None, **kwargs)[source]

Write a GeoDataFrame to the Feather format.

Any geometry columns present are serialized to WKB format in the file.

Requires ‘pyarrow’ >= 0.17.

Added in version 0.8.

Parameters:
  • path (str, path object)

  • index (bool, default None) – If True, always include the dataframe’s index(es) as columns in the file output. If False, the index(es) will not be written to the file. If None, the index(ex) will be included as columns in the file output except RangeIndex which is stored as metadata only.

  • compression ({'zstd', 'lz4', 'uncompressed'}, optional) – Name of the compression to use. Use "uncompressed" for no compression. By default uses LZ4 if available, otherwise uncompressed.

  • schema_version ({'0.1.0', '0.4.0', '1.0.0', None}) – GeoParquet specification version; if not provided will default to latest supported version.

  • kwargs – Additional keyword arguments passed to pyarrow.feather.write_feather().

Examples

>>> gdf.to_feather('data.feather')

See also

GeoDataFrame.to_parquet

write GeoDataFrame to parquet

GeoDataFrame.to_file

write GeoDataFrame to file

to_file(filename, driver=None, schema=None, index=None, **kwargs)[source]

Write the GeoDataFrame to a file.

By default, an ESRI shapefile is written, but any OGR data source supported by Pyogrio or Fiona can be written. A dictionary of supported OGR providers is available via:

>>> import pyogrio
>>> pyogrio.list_drivers()
Parameters:
  • filename (string) – File path or file handle to write to. The path may specify a GDAL VSI scheme.

  • driver (string, default None) – The OGR format driver used to write the vector file. If not specified, it attempts to infer it from the file extension. If no extension is specified, it saves ESRI Shapefile to a folder.

  • schema (dict, default None) – If specified, the schema dictionary is passed to Fiona to better control how the file is written. If None, GeoPandas will determine the schema based on each column’s dtype. Not supported for the “pyogrio” engine.

  • index (bool, default None) –

    If True, write index into one or more columns (for MultiIndex). Default None writes the index into one or more columns only if the index is named, is a MultiIndex, or has a non-integer data type. If False, no index is written.

    Added in version 0.7: Previously the index was not written.

  • mode (string, default 'w') – The write mode, ‘w’ to overwrite the existing file and ‘a’ to append. Not all drivers support appending. The drivers that support appending are listed in fiona.supported_drivers or https://github.com/Toblerity/Fiona/blob/master/fiona/drvsupport.py

  • crs (pyproj.CRS, default None) – If specified, the CRS is passed to Fiona to better control how the file is written. If None, GeoPandas will determine the crs based on crs df attribute. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string. The keyword is not supported for the “pyogrio” engine.

  • engine (str, "pyogrio" or "fiona") – The underlying library that is used to write the file. Currently, the supported options are “pyogrio” and “fiona”. Defaults to “pyogrio” if installed, otherwise tries “fiona”.

  • metadata (dict[str, str], default None) – Optional metadata to be stored in the file. Keys and values must be strings. Supported only for “GPKG” driver.

  • **kwargs – Keyword args to be passed to the engine, and can be used to write to multi-layer data, store data within archives (zip files), etc. In case of the “pyogrio” engine, the keyword arguments are passed to pyogrio.write_dataframe. In case of the “fiona” engine, the keyword arguments are passed to fiona.open`. For more information on possible keywords, type: import pyogrio; help(pyogrio.write_dataframe).

Notes

The format drivers will attempt to detect the encoding of your data, but may fail. In this case, the proper encoding can be specified explicitly by using the encoding keyword parameter, e.g. encoding='utf-8'.

See also

GeoSeries.to_file

GeoDataFrame.to_postgis

write GeoDataFrame to PostGIS database

GeoDataFrame.to_parquet

write GeoDataFrame to parquet

GeoDataFrame.to_feather

write GeoDataFrame to feather

Examples

>>> gdf.to_file('dataframe.shp')
>>> gdf.to_file('dataframe.gpkg', driver='GPKG', layer='name')
>>> gdf.to_file('dataframe.geojson', driver='GeoJSON')

With selected drivers you can also append to a file with mode=”a”:

>>> gdf.to_file('dataframe.shp', mode="a")

Using the engine-specific keyword arguments it is possible to e.g. create a spatialite file with a custom layer name:

>>> gdf.to_file(
...     'dataframe.sqlite', driver='SQLite', spatialite=True, layer='test'
... )
to_gbq(destination_table, project_id=None, chunksize=None, reauth=False, if_exists='fail', auth_local_webserver=True, table_schema=None, location=None, progress_bar=True, credentials=None)

Write a DataFrame to a Google BigQuery table.

Deprecated since version 2.2.0: Please use pandas_gbq.to_gbq instead.

This function requires the pandas-gbq package.

See the How to authenticate with Google BigQuery guide for authentication instructions.

Parameters:
  • destination_table (str) – Name of table to be written, in the form dataset.tablename.

  • project_id (str, optional) – Google BigQuery Account project ID. Optional when available from the environment.

  • chunksize (int, optional) – Number of rows to be inserted in each chunk from the dataframe. Set to None to load the whole dataframe at once.

  • reauth (bool, default False) – Force Google BigQuery to re-authenticate the user. This is useful if multiple accounts are used.

  • if_exists (str, default 'fail') –

    Behavior when the destination table exists. Value can be one of:

    'fail'

    If table exists raise pandas_gbq.gbq.TableCreationError.

    'replace'

    If table exists, drop it, recreate it, and insert data.

    'append'

    If table exists, insert data. Create if does not exist.

  • auth_local_webserver (bool, default True) –

    Use the local webserver flow instead of the console flow when getting user credentials.

    New in version 0.2.0 of pandas-gbq.

    Changed in version 1.5.0: Default value is changed to True. Google has deprecated the auth_local_webserver = False “out of band” (copy-paste) flow.

  • table_schema (list of dicts, optional) –

    List of BigQuery table fields to which according DataFrame columns conform to, e.g. [{'name': 'col1', 'type': 'STRING'},...]. If schema is not provided, it will be generated according to dtypes of DataFrame columns. See BigQuery API documentation on available names of a field.

    New in version 0.3.1 of pandas-gbq.

  • location (str, optional) –

    Location where the load job should run. See the BigQuery locations documentation for a list of available locations. The location must match that of the target dataset.

    New in version 0.5.0 of pandas-gbq.

  • progress_bar (bool, default True) –

    Use the library tqdm to show the progress bar for the upload, chunk by chunk.

    New in version 0.5.0 of pandas-gbq.

  • credentials (google.auth.credentials.Credentials, optional) –

    Credentials for accessing Google APIs. Use this parameter to override default credentials, such as to use Compute Engine google.auth.compute_engine.Credentials or Service Account google.oauth2.service_account.Credentials directly.

    New in version 0.8.0 of pandas-gbq.

Return type:

None

See also

pandas_gbq.to_gbq

This function in the pandas-gbq library.

read_gbq

Read a DataFrame from Google BigQuery.

Examples

Example taken from Google BigQuery documentation

>>> project_id = "my-project"
>>> table_id = 'my_dataset.my_table'
>>> df = pd.DataFrame({
...                   "my_string": ["a", "b", "c"],
...                   "my_int64": [1, 2, 3],
...                   "my_float64": [4.0, 5.0, 6.0],
...                   "my_bool1": [True, False, True],
...                   "my_bool2": [False, True, False],
...                   "my_dates": pd.date_range("now", periods=3),
...                   }
...                   )
>>> df.to_gbq(table_id, project_id=project_id)
to_geo_dict(na='null', show_bbox=False, drop_id=False)[source]

Return a python feature collection representation of the GeoDataFrame as a dictionary with a list of features based on the __geo_interface__ GeoJSON-like specification.

Parameters:
  • na (str, optional) –

    Options are {‘null’, ‘drop’, ‘keep’}, default ‘null’. Indicates how to output missing (NaN) values in the GeoDataFrame

    • null: output the missing entries as JSON null

    • drop: remove the property from the feature. This applies to each feature individually so that features may have different properties

    • keep: output the missing entries as NaN

  • show_bbox (bool, optional) – Include bbox (bounds) in the geojson. Default False.

  • drop_id (bool, default: False) – Whether to retain the index of the GeoDataFrame as the id property in the generated dictionary. Default is False, but may want True if the index is just arbitrary row numbers.

Return type:

dict

Examples

>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = geopandas.GeoDataFrame(d)
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)
>>> gdf.to_geo_dict()
{'type': 'FeatureCollection', 'features': [{'id': '0', 'type': 'Feature', 'properties': {'col1': 'name1'}, 'geometry': {'type': 'Point', 'coordinates': (1.0, 2.0)}}, {'id': '1', 'type': 'Feature', 'properties': {'col1': 'name2'}, 'geometry': {'type': 'Point', 'coordinates': (2.0, 1.0)}}]}

See also

GeoDataFrame.to_json

return a GeoDataFrame as a GeoJSON string

to_hdf(path_or_buf, key, mode='a', complevel=None, complib=None, append=False, format=None, index=True, min_itemsize=None, nan_rep=None, dropna=None, data_columns=None, errors='strict', encoding='UTF-8')

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.

In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.

Warning

One can store a subclass of DataFrame or Series to HDF5, but the type of the subclass is lost upon storing.

For more information see the user guide.

Parameters:
  • path_or_buf (str or pandas.HDFStore) – File path or HDFStore object.

  • key (str) – Identifier for the group in the store.

  • mode ({'a', 'w', 'r+'}, default 'a') –

    Mode to open file:

    • ’w’: write, a new file is created (an existing file with the same name would be deleted).

    • ’a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.

    • ’r+’: similar to ‘a’, but the file must already exist.

  • complevel ({0-9}, default None) – Specifies a compression level for data. A value of 0 or None disables compression.

  • complib ({'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib') – Specifies the compression library to be used. These additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.

  • append (bool, default False) – For Table formats, append the input data to the existing.

  • format ({'fixed', 'table', None}, default 'fixed') –

    Possible values:

    • ’fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.

    • ’table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.

    • If None, pd.get_option(‘io.hdf.default_format’) is checked, followed by fallback to “fixed”.

  • index (bool, default True) – Write DataFrame index as a column.

  • min_itemsize (dict or int, optional) – Map column names to minimum string sizes for columns.

  • nan_rep (Any, optional) – How to represent null values as str. Not allowed with append=True.

  • dropna (bool, default False, optional) – Remove missing values.

  • data_columns (list of columns or True, optional) – List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via data columns. for more information. Applicable only to format=’table’.

  • errors (str, default 'strict') – Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

  • encoding (str, default "UTF-8")

Return type:

None

See also

read_hdf

Read from HDF file.

DataFrame.to_orc

Write a DataFrame to the binary orc format.

DataFrame.to_parquet

Write a DataFrame to the binary parquet format.

DataFrame.to_sql

Write to a SQL table.

DataFrame.to_feather

Write out feather-format for DataFrames.

DataFrame.to_csv

Write out to a csv file.

Examples

>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},
...                   index=['a', 'b', 'c'])
>>> df.to_hdf('data.h5', key='df', mode='w')

We can add another object to the same file:

>>> s = pd.Series([1, 2, 3, 4])
>>> s.to_hdf('data.h5', key='s')

Reading from HDF file:

>>> pd.read_hdf('data.h5', 'df')
A  B
a  1  4
b  2  5
c  3  6
>>> pd.read_hdf('data.h5', 's')
0    1
1    2
2    3
3    4
dtype: int64
to_html(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, max_cols=None, show_dimensions=False, decimal='.', bold_rows=True, classes=None, escape=True, notebook=False, border=None, table_id=None, render_links=False, encoding=None)

Render a DataFrame as an HTML table.

Parameters:
  • buf (str, Path or StringIO-like, optional, default None) – Buffer to write to. If None, the output is returned as a string.

  • columns (array-like, optional, default None) – The subset of columns to write. Writes all columns by default.

  • col_space (str or int, list or dict of int or str, optional) – The minimum width of each column in CSS length units. An int is assumed to be px units..

  • header (bool, optional) – Whether to print column labels, default True.

  • index (bool, optional, default True) – Whether to print index (row) labels.

  • na_rep (str, optional, default 'NaN') – String representation of NaN to use.

  • formatters (list, tuple or dict of one-param. functions, optional) – Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.

  • float_format (one-parameter function, optional, default None) – Formatter function to apply to columns’ elements if they are floats. This function must return a unicode string and will be applied only to the non-NaN elements, with NaN being handled by na_rep.

  • sparsify (bool, optional, default True) – Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.

  • index_names (bool, optional, default True) – Prints the names of the indexes.

  • justify (str, default None) –

    How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box. Valid values are

    • left

    • right

    • center

    • justify

    • justify-all

    • start

    • end

    • inherit

    • match-parent

    • initial

    • unset.

  • max_rows (int, optional) – Maximum number of rows to display in the console.

  • max_cols (int, optional) – Maximum number of columns to display in the console.

  • show_dimensions (bool, default False) – Display DataFrame dimensions (number of rows by number of columns).

  • decimal (str, default '.') – Character recognized as decimal separator, e.g. ‘,’ in Europe.

  • bold_rows (bool, default True) – Make the row labels bold in the output.

  • classes (str or list or tuple, default None) – CSS class(es) to apply to the resulting html table.

  • escape (bool, default True) – Convert the characters <, >, and & to HTML-safe sequences.

  • notebook ({True, False}, default False) – Whether the generated HTML is for IPython Notebook.

  • border (int) – A border=border attribute is included in the opening <table> tag. Default pd.options.display.html.border.

  • table_id (str, optional) – A css id is included in the opening <table> tag if specified.

  • render_links (bool, default False) – Convert URLs to HTML links.

  • encoding (str, default "utf-8") – Set character encoding.

Returns:

If buf is None, returns the result as a string. Otherwise returns None.

Return type:

str or None

See also

to_string

Convert DataFrame to a string.

Examples

>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [4, 3]})
>>> html_string = '''<table border="1" class="dataframe">
...   <thead>
...     <tr style="text-align: right;">
...       <th></th>
...       <th>col1</th>
...       <th>col2</th>
...     </tr>
...   </thead>
...   <tbody>
...     <tr>
...       <th>0</th>
...       <td>1</td>
...       <td>4</td>
...     </tr>
...     <tr>
...       <th>1</th>
...       <td>2</td>
...       <td>3</td>
...     </tr>
...   </tbody>
... </table>'''
>>> assert html_string == df.to_html()
to_json(na='null', show_bbox=False, drop_id=False, to_wgs84=False, **kwargs)[source]

Return a GeoJSON representation of the GeoDataFrame as a string.

Parameters:
  • na ({'null', 'drop', 'keep'}, default 'null') – Indicates how to output missing (NaN) values in the GeoDataFrame. See below.

  • show_bbox (bool, optional, default: False) – Include bbox (bounds) in the geojson

  • drop_id (bool, default: False) – Whether to retain the index of the GeoDataFrame as the id property in the generated GeoJSON. Default is False, but may want True if the index is just arbitrary row numbers.

  • to_wgs84 (bool, optional, default: False) – If the CRS is set on the active geometry column it is exported as WGS84 (EPSG:4326) to meet the 2016 GeoJSON specification. Set to True to force re-projection and set to False to ignore CRS. False by default.

Return type:

str

Notes

The remaining kwargs are passed to json.dumps().

Missing (NaN) values in the GeoDataFrame can be represented as follows:

  • null: output the missing entries as JSON null.

  • drop: remove the property from the feature. This applies to each feature individually so that features may have different properties.

  • keep: output the missing entries as NaN.

If the GeoDataFrame has a defined CRS, its definition will be included in the output unless it is equal to WGS84 (default GeoJSON CRS) or not possible to represent in the URN OGC format, or unless to_wgs84=True is specified.

Examples

>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:3857")
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)
>>> gdf.to_json()
'{"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"col1": "name1"}, "geometry": {"type": "Point", "coordinates": [1.0, 2.0]}}, {"id": "1", "type": "Feature", "properties": {"col1": "name2"}, "geometry": {"type": "Point", "coordinates": [2.0, 1.0]}}], "crs": {"type": "name", "properties": {"name": "urn:ogc:def:crs:EPSG::3857"}}}'

Alternatively, you can write GeoJSON to file:

>>> gdf.to_file(path, driver="GeoJSON")

See also

GeoDataFrame.to_file

write GeoDataFrame to file

to_latex(buf=None, columns=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=False, column_format=None, longtable=None, escape=None, encoding=None, decimal='.', multicolumn=None, multicolumn_format=None, multirow=None, caption=None, label=None, position=None)

Render object to a LaTeX tabular, longtable, or nested table.

Requires \usepackage{{booktabs}}. The output can be copy/pasted into a main LaTeX document or read from an external file with \input{{table.tex}}.

Changed in version 2.0.0: Refactored to use the Styler implementation via jinja2 templating.

Parameters:
  • buf (str, Path or StringIO-like, optional, default None) – Buffer to write to. If None, the output is returned as a string.

  • columns (list of label, optional) – The subset of columns to write. Writes all columns by default.

  • header (bool or list of str, default True) – Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.

  • index (bool, default True) – Write row names (index).

  • na_rep (str, default 'NaN') – Missing data representation.

  • formatters (list of functions or dict of {{str: function}}, optional) – Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List must be of length equal to the number of columns.

  • float_format (one-parameter function or str, optional, default None) – Formatter for floating point numbers. For example float_format="%.2f" and float_format="{{:0.2f}}".format will both result in 0.1234 being formatted as 0.12.

  • sparsify (bool, optional) – Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row. By default, the value will be read from the config module.

  • index_names (bool, default True) – Prints the names of the indexes.

  • bold_rows (bool, default False) – Make the row labels bold in the output.

  • column_format (str, optional) – The columns format as specified in LaTeX table format e.g. ‘rcl’ for 3 columns. By default, ‘l’ will be used for all columns except columns of numbers, which default to ‘r’.

  • longtable (bool, optional) –

    Use a longtable environment instead of tabular. Requires adding a usepackage{{longtable}} to your LaTeX preamble. By default, the value will be read from the pandas config module, and set to True if the option styler.latex.environment is “longtable”.

    Changed in version 2.0.0: The pandas option affecting this argument has changed.

  • escape (bool, optional) –

    By default, the value will be read from the pandas config module and set to True if the option styler.format.escape is “latex”. When set to False prevents from escaping latex special characters in column names.

    Changed in version 2.0.0: The pandas option affecting this argument has changed, as has the default value to False.

  • encoding (str, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’.

  • decimal (str, default '.') – Character recognized as decimal separator, e.g. ‘,’ in Europe.

  • multicolumn (bool, default True) –

    Use multicolumn to enhance MultiIndex columns. The default will be read from the config module, and is set as the option styler.sparse.columns.

    Changed in version 2.0.0: The pandas option affecting this argument has changed.

  • multicolumn_format (str, default 'r') –

    The alignment for multicolumns, similar to column_format The default will be read from the config module, and is set as the option styler.latex.multicol_align.

    Changed in version 2.0.0: The pandas option affecting this argument has changed, as has the default value to “r”.

  • multirow (bool, default True) –

    Use multirow to enhance MultiIndex rows. Requires adding a usepackage{{multirow}} to your LaTeX preamble. Will print centered labels (instead of top-aligned) across the contained rows, separating groups via clines. The default will be read from the pandas config module, and is set as the option styler.sparse.index.

    Changed in version 2.0.0: The pandas option affecting this argument has changed, as has the default value to True.

  • caption (str or tuple, optional) – Tuple (full_caption, short_caption), which results in \caption[short_caption]{{full_caption}}; if a single string is passed, no short caption will be set.

  • label (str, optional) – The LaTeX label to be placed inside \label{{}} in the output. This is used with \ref{{}} in the main .tex file.

  • position (str, optional) – The LaTeX positional argument for tables, to be placed after \begin{{}} in the output.

Returns:

If buf is None, returns the result as a string. Otherwise returns None.

Return type:

str or None

See also

io.formats.style.Styler.to_latex

Render a DataFrame to LaTeX with conditional formatting.

DataFrame.to_string

Render a DataFrame to a console-friendly tabular output.

DataFrame.to_html

Render a DataFrame as an HTML table.

Notes

As of v2.0.0 this method has changed to use the Styler implementation as part of Styler.to_latex() via jinja2 templating. This means that jinja2 is a requirement, and needs to be installed, for this method to function. It is advised that users switch to using Styler, since that implementation is more frequently updated and contains much more flexibility with the output.

Examples

Convert a general DataFrame to LaTeX with formatting:

>>> df = pd.DataFrame(dict(name=['Raphael', 'Donatello'],
...                        age=[26, 45],
...                        height=[181.23, 177.65]))
>>> print(df.to_latex(index=False,
...                   formatters={"name": str.upper},
...                   float_format="{:.1f}".format,
... ))
\begin{tabular}{lrr}
\toprule
name & age & height \\
\midrule
RAPHAEL & 26 & 181.2 \\
DONATELLO & 45 & 177.7 \\
\bottomrule
\end{tabular}
to_markdown(buf=None, mode='wt', index=True, storage_options=None, **kwargs)

Print DataFrame in Markdown-friendly format.

Parameters:
  • buf (str, Path or StringIO-like, optional, default None) – Buffer to write to. If None, the output is returned as a string.

  • mode (str, optional) – Mode in which file is opened, “wt” by default.

  • index (bool, optional, default True) – Add index (row) labels.

  • storage_options (dict, optional) –

    Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

  • **kwargs – These parameters will be passed to tabulate.

Returns:

DataFrame in Markdown-friendly format.

Return type:

str

Notes

Requires the tabulate package.

Examples
>>> df = pd.DataFrame(
...     data={"animal_1": ["elk", "pig"], "animal_2": ["dog", "quetzal"]}
... )
>>> print(df.to_markdown())
|    | animal_1   | animal_2   |
|---:|:-----------|:-----------|
|  0 | elk        | dog        |
|  1 | pig        | quetzal    |

Output markdown with a tabulate option.

>>> print(df.to_markdown(tablefmt="grid"))
+----+------------+------------+
|    | animal_1   | animal_2   |
+====+============+============+
|  0 | elk        | dog        |
+----+------------+------------+
|  1 | pig        | quetzal    |
+----+------------+------------+
to_numpy(dtype=None, copy=False, na_value=<no_default>)

Convert the DataFrame to a NumPy array.

By default, the dtype of the returned array will be the common NumPy dtype of all types in the DataFrame. For example, if the dtypes are float16 and float32, the results dtype will be float32. This may require copying data and coercing values, which may be expensive.

Parameters:
  • dtype (str or numpy.dtype, optional) – The dtype to pass to numpy.asarray().

  • copy (bool, default False) – Whether to ensure that the returned value is not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

  • na_value (Any, optional) – The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.

Return type:

numpy.ndarray

See also

Series.to_numpy

Similar method for Series.

Examples

>>> pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
       [2, 4]])

With heterogeneous data, the lowest common type will have to be used.

>>> df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
>>> df.to_numpy()
array([[1. , 3. ],
       [2. , 4.5]])

For a mix of numeric and non-numeric types, the output array will have object dtype.

>>> df['C'] = pd.date_range('2000', periods=2)
>>> df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
       [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)
to_orc(path=None, *, engine='pyarrow', index=None, engine_kwargs=None)

Write a DataFrame to the ORC format.

Added in version 1.5.0.

Parameters:
  • path (str, file-like object or None, default None) – If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function). If path is None, a bytes object is returned.

  • engine ({'pyarrow'}, default 'pyarrow') – ORC library to use.

  • index (bool, optional) – If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, similar to infer the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.

  • engine_kwargs (dict[str, Any] or None, default None) – Additional keyword arguments passed to pyarrow.orc.write_table().

Return type:

bytes if no path argument is provided else None

Raises:
  • NotImplementedError – Dtype of one or more columns is category, unsigned integers, interval, period or sparse.

  • ValueError – engine is not pyarrow.

See also

read_orc

Read a ORC file.

DataFrame.to_parquet

Write a parquet file.

DataFrame.to_csv

Write a csv file.

DataFrame.to_sql

Write to a sql table.

DataFrame.to_hdf

Write to hdf.

Notes

Examples

>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [4, 3]})
>>> df.to_orc('df.orc')
>>> pd.read_orc('df.orc')
   col1  col2
0     1     4
1     2     3

If you want to get a buffer to the orc content you can write it to io.BytesIO

>>> import io
>>> b = io.BytesIO(df.to_orc())
>>> b.seek(0)
0
>>> content = b.read()
to_parquet(path, index=None, compression='snappy', geometry_encoding='WKB', write_covering_bbox=False, schema_version=None, **kwargs)[source]

Write a GeoDataFrame to the Parquet format.

By default, all geometry columns present are serialized to WKB format in the file.

Requires ‘pyarrow’.

Added in version 0.8.

Parameters:
  • path (str, path object)

  • index (bool, default None) – If True, always include the dataframe’s index(es) as columns in the file output. If False, the index(es) will not be written to the file. If None, the index(ex) will be included as columns in the file output except RangeIndex which is stored as metadata only.

  • compression ({'snappy', 'gzip', 'brotli', 'lz4', 'zstd', None}, default 'snappy') – Name of the compression to use. Use None for no compression.

  • geometry_encoding ({'WKB', 'geoarrow'}, default 'WKB') – The encoding to use for the geometry columns. Defaults to “WKB” for maximum interoperability. Specify “geoarrow” to use one of the native GeoArrow-based single-geometry type encodings. Note: the “geoarrow” option is part of the newer GeoParquet 1.1 specification, should be considered as experimental, and may not be supported by all readers.

  • write_covering_bbox (bool, default False) – Writes the bounding box column for each row entry with column name ‘bbox’. Writing a bbox column can be computationally expensive, but allows you to specify a bbox in : func:read_parquet for filtered reading. Note: this bbox column is part of the newer GeoParquet 1.1 specification and should be considered as experimental. While writing the column is backwards compatible, using it for filtering may not be supported by all readers.

  • schema_version ({'0.1.0', '0.4.0', '1.0.0', '1.1.0', None}) – GeoParquet specification version; if not provided, will default to latest supported stable version (1.0.0).

  • kwargs – Additional keyword arguments passed to pyarrow.parquet.write_table().

Return type:

None

Examples

>>> gdf.to_parquet('data.parquet')

See also

GeoDataFrame.to_feather

write GeoDataFrame to feather

GeoDataFrame.to_file

write GeoDataFrame to file

to_period(freq=None, axis=0, copy=None)

Convert DataFrame from DatetimeIndex to PeriodIndex.

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed).

Parameters:
  • freq (str, default) – Frequency of the PeriodIndex.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to convert (the index by default).

  • copy (bool, default True) –

    If False then underlying input data is not copied.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

The DataFrame has a PeriodIndex.

Return type:

DataFrame

Examples

>>> idx = pd.to_datetime(
...     [
...         "2001-03-31 00:00:00",
...         "2002-05-31 00:00:00",
...         "2003-08-31 00:00:00",
...     ]
... )
>>> idx
DatetimeIndex(['2001-03-31', '2002-05-31', '2003-08-31'],
dtype='datetime64[ns]', freq=None)
>>> idx.to_period("M")
PeriodIndex(['2001-03', '2002-05', '2003-08'], dtype='period[M]')

For the yearly frequency

>>> idx.to_period("Y")
PeriodIndex(['2001', '2002', '2003'], dtype='period[Y-DEC]')
to_pickle(path, compression='infer', protocol=5, storage_options=None)

Pickle (serialize) object to file.

Parameters:
  • path (str, path object, or file-like object) – String, path object (implementing os.PathLike[str]), or file-like object implementing a binary write() function. File path where the pickled object will be stored.

  • compression (str or dict, default 'infer') –

    For on-the-fly compression of the output data. If ‘infer’ and ‘path’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.

    Added in version 1.5.0: Added support for .tar files.

  • protocol (int) –

    Int which indicates which protocol should be used by the pickler, default HIGHEST_PROTOCOL (see [1]_ paragraph 12.1.2). The possible values are 0, 1, 2, 3, 4, 5. A negative value for the protocol parameter is equivalent to setting its value to HIGHEST_PROTOCOL.

  • storage_options (dict, optional) –

    Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

Return type:

None

See also

read_pickle

Load pickled pandas object (or any object) from file.

DataFrame.to_hdf

Write DataFrame to an HDF5 file.

DataFrame.to_sql

Write DataFrame to a SQL database.

DataFrame.to_parquet

Write a DataFrame to the binary parquet format.

Examples

>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9
>>> original_df.to_pickle("./dummy.pkl")
>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9
to_postgis(name, con, schema=None, if_exists='fail', index=False, index_label=None, chunksize=None, dtype=None)[source]

Upload GeoDataFrame into PostGIS database.

This method requires SQLAlchemy and GeoAlchemy2, and a PostgreSQL Python driver (psycopg or psycopg2) to be installed.

It is also possible to use to_file() to write to a database. Especially for file geodatabases like GeoPackage or SpatiaLite this can be easier.

Parameters:
  • name (str) – Name of the target table.

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – Active connection to the PostGIS database.

  • if_exists ({'fail', 'replace', 'append'}, default 'fail') –

    How to behave if the table already exists:

    • fail: Raise a ValueError.

    • replace: Drop the table before inserting new values.

    • append: Insert new values to the existing table.

  • schema (string, optional) – Specify the schema. If None, use default schema: ‘public’.

  • index (bool, default False) – Write DataFrame index as a column. Uses index_label as the column name in the table.

  • index_label (string or sequence, default None) – Column label for index column(s). If None is given (default) and index is True, then the index names are used.

  • chunksize (int, optional) – Rows will be written in batches of this size at a time. By default, all rows will be written at once.

  • dtype (dict of column name to SQL type, default None) – Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types.

Return type:

None

Examples

>>> from sqlalchemy import create_engine
>>> engine = create_engine("postgresql://myusername:mypassword@myhost:5432/mydatabase")
>>> gdf.to_postgis("my_table", engine)

See also

GeoDataFrame.to_file

write GeoDataFrame to file

read_postgis

read PostGIS database to GeoDataFrame

to_records(index=True, column_dtypes=None, index_dtypes=None)

Convert DataFrame to a NumPy record array.

Index will be included as the first field of the record array if requested.

Parameters:
  • index (bool, default True) – Include index in resulting record array, stored in ‘index’ field or using the index label, if set.

  • column_dtypes (str, type, dict, default None) – If a string or type, the data type to store all columns. If a dictionary, a mapping of column names and indices (zero-indexed) to specific data types.

  • index_dtypes (str, type, dict, default None) –

    If a string or type, the data type to store all index levels. If a dictionary, a mapping of index level names and indices (zero-indexed) to specific data types.

    This mapping is applied only if index=True.

Returns:

NumPy ndarray with the DataFrame labels as fields and each row of the DataFrame as entries.

Return type:

numpy.rec.recarray

See also

DataFrame.from_records

Convert structured or record ndarray to DataFrame.

numpy.rec.recarray

An ndarray that allows field access using attributes, analogous to typed columns in a spreadsheet.

Examples

>>> df = pd.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
...                   index=['a', 'b'])
>>> df
   A     B
a  1  0.50
b  2  0.75
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])

If the DataFrame index has no label then the recarray field name is set to ‘index’. If the index has a label then this is used as the field name:

>>> df.index = df.index.rename("I")
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('I', 'O'), ('A', '<i8'), ('B', '<f8')])

The index can be excluded from the record array:

>>> df.to_records(index=False)
rec.array([(1, 0.5 ), (2, 0.75)],
          dtype=[('A', '<i8'), ('B', '<f8')])

Data types can be specified for the columns:

>>> df.to_records(column_dtypes={"A": "int32"})
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('I', 'O'), ('A', '<i4'), ('B', '<f8')])

As well as for the index:

>>> df.to_records(index_dtypes="<S2")
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
          dtype=[('I', 'S2'), ('A', '<i8'), ('B', '<f8')])
>>> index_dtypes = f"<S{df.index.str.len().max()}"
>>> df.to_records(index_dtypes=index_dtypes)
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
          dtype=[('I', 'S1'), ('A', '<i8'), ('B', '<f8')])
to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)

Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1]_ are supported. Tables can be newly created, appended to, or overwritten.

Parameters:
  • name (str) – Name of SQL table.

  • con (sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection) –

    Using SQLAlchemy makes it possible to use any DB supported by that library. Legacy support is provided for sqlite3.Connection objects. The user is responsible for engine disposal and connection closure for the SQLAlchemy connectable. See here. If passing a sqlalchemy.engine.Connection which is already in a transaction, the transaction will not be committed. If passing a sqlite3.Connection, it will not be possible to roll back the record insertion.

  • schema (str, optional) – Specify the schema (if database flavor supports this). If None, use default schema.

  • if_exists ({'fail', 'replace', 'append'}, default 'fail') –

    How to behave if the table already exists.

    • fail: Raise a ValueError.

    • replace: Drop the table before inserting new values.

    • append: Insert new values to the existing table.

  • index (bool, default True) – Write DataFrame index as a column. Uses index_label as the column name in the table. Creates a table index for this column.

  • index_label (str or sequence, default None) – Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

  • chunksize (int, optional) – Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once.

  • dtype (dict or scalar, optional) – Specifying the datatype for columns. If a dictionary is used, the keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode. If a scalar is provided, it will be applied to all columns.

  • method ({None, 'multi', callable}, optional) –

    Controls the SQL insertion clause used:

    • None : Uses standard SQL INSERT clause (one per row).

    • ’multi’: Pass multiple values in a single INSERT clause.

    • callable with signature (pd_table, conn, keys, data_iter).

    Details and a sample callable implementation can be found in the section insert method.

Returns:

Number of rows affected by to_sql. None is returned if the callable passed into method does not return an integer number of rows.

The number of returned rows affected is the sum of the rowcount attribute of sqlite3.Cursor or SQLAlchemy connectable which may not reflect the exact number of written rows as stipulated in the sqlite3 or SQLAlchemy.

Added in version 1.4.0.

Return type:

None or int

Raises:

ValueError – When the table already exists and if_exists is ‘fail’ (the default).

See also

read_sql

Read a DataFrame from a table.

Notes

Timezone aware datetime columns will be written as Timestamp with timezone type with SQLAlchemy if supported by the database. Otherwise, the datetimes will be stored as timezone unaware timestamps local to the original timezone.

Not all datastores support method="multi". Oracle, for example, does not support multi-value insert.

References

Examples

Create an in-memory SQLite database.

>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite://', echo=False)

Create a table from scratch with 3 rows.

>>> df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
>>> df
     name
0  User 1
1  User 2
2  User 3
>>> df.to_sql(name='users', con=engine)
3
>>> from sqlalchemy import text
>>> with engine.connect() as conn:
...    conn.execute(text("SELECT * FROM users")).fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3')]

An sqlalchemy.engine.Connection can also be passed to con:

>>> with engine.begin() as connection:
...     df1 = pd.DataFrame({'name' : ['User 4', 'User 5']})
...     df1.to_sql(name='users', con=connection, if_exists='append')
2

This is allowed to support operations that require that the same DBAPI connection is used for the entire operation.

>>> df2 = pd.DataFrame({'name' : ['User 6', 'User 7']})
>>> df2.to_sql(name='users', con=engine, if_exists='append')
2
>>> with engine.connect() as conn:
...    conn.execute(text("SELECT * FROM users")).fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3'),
 (0, 'User 4'), (1, 'User 5'), (0, 'User 6'),
 (1, 'User 7')]

Overwrite the table with just df2.

>>> df2.to_sql(name='users', con=engine, if_exists='replace',
...            index_label='id')
2
>>> with engine.connect() as conn:
...    conn.execute(text("SELECT * FROM users")).fetchall()
[(0, 'User 6'), (1, 'User 7')]

Use method to define a callable insertion method to do nothing if there’s a primary key conflict on a table in a PostgreSQL database.

>>> from sqlalchemy.dialects.postgresql import insert
>>> def insert_on_conflict_nothing(table, conn, keys, data_iter):
...     # "a" is the primary key in "conflict_table"
...     data = [dict(zip(keys, row)) for row in data_iter]
...     stmt = insert(table.table).values(data).on_conflict_do_nothing(index_elements=["a"])
...     result = conn.execute(stmt)
...     return result.rowcount
>>> df_conflict.to_sql(name="conflict_table", con=conn, if_exists="append", method=insert_on_conflict_nothing)
0

For MySQL, a callable to update columns b and c if there’s a conflict on a primary key.

>>> from sqlalchemy.dialects.mysql import insert
>>> def insert_on_conflict_update(table, conn, keys, data_iter):
...     # update columns "b" and "c" on primary key conflict
...     data = [dict(zip(keys, row)) for row in data_iter]
...     stmt = (
...         insert(table.table)
...         .values(data)
...     )
...     stmt = stmt.on_duplicate_key_update(b=stmt.inserted.b, c=stmt.inserted.c)
...     result = conn.execute(stmt)
...     return result.rowcount
>>> df_conflict.to_sql(name="conflict_table", con=conn, if_exists="append", method=insert_on_conflict_update)
2

Specify the dtype (especially useful for integers with missing values). Notice that while pandas is forced to store the data as floating point, the database supports nullable integers. When fetching the data with Python, we get back integer scalars.

>>> df = pd.DataFrame({"A": [1, None, 2]})
>>> df
     A
0  1.0
1  NaN
2  2.0
>>> from sqlalchemy.types import Integer
>>> df.to_sql(name='integers', con=engine, index=False,
...           dtype={"A": Integer()})
3
>>> with engine.connect() as conn:
...   conn.execute(text("SELECT * FROM integers")).fetchall()
[(1,), (None,), (2,)]
to_stata(path, *, convert_dates=None, write_index=True, byteorder=None, time_stamp=None, data_label=None, variable_labels=None, version=114, convert_strl=None, compression='infer', storage_options=None, value_labels=None)

Export DataFrame object to Stata dta format.

Writes the DataFrame to a Stata dataset file. “dta” files contain a Stata dataset.

Parameters:
  • path (str, path object, or buffer) – String, path object (implementing os.PathLike[str]), or file-like object implementing a binary write() function.

  • convert_dates (dict) – Dictionary mapping columns containing datetime types to stata internal format to use when writing the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either an integer or a name. Datetime columns that do not have a conversion type specified will be converted to ‘tc’. Raises NotImplementedError if a datetime column has timezone information.

  • write_index (bool) – Write the index to Stata dataset.

  • byteorder (str) – Can be “>”, “<”, “little”, or “big”. default is sys.byteorder.

  • time_stamp (datetime) – A datetime to use as file creation date. Default is the current time.

  • data_label (str, optional) – A label for the data set. Must be 80 characters or smaller.

  • variable_labels (dict) – Dictionary containing columns as keys and variable labels as values. Each label must be 80 characters or smaller.

  • version ({114, 117, 118, 119, None}, default 114) –

    Version to use in the output dta file. Set to None to let pandas decide between 118 or 119 formats depending on the number of columns in the frame. Version 114 can be read by Stata 10 and later. Version 117 can be read by Stata 13 or later. Version 118 is supported in Stata 14 and later. Version 119 is supported in Stata 15 and later. Version 114 limits string variables to 244 characters or fewer while versions 117 and later allow strings with lengths up to 2,000,000 characters. Versions 118 and 119 support Unicode characters, and version 119 supports more than 32,767 variables.

    Version 119 should usually only be used when the number of variables exceeds the capacity of dta format 118. Exporting smaller datasets in format 119 may have unintended consequences, and, as of November 2020, Stata SE cannot read version 119 files.

  • convert_strl (list, optional) – List of column names to convert to string columns to Stata StrL format. Only available if version is 117. Storing strings in the StrL format can produce smaller dta files if strings have more than 8 characters and values are repeated.

  • compression (str or dict, default 'infer') –

    For on-the-fly compression of the output data. If ‘infer’ and ‘path’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.

    Added in version 1.5.0: Added support for .tar files.

    Changed in version 1.4.0: Zstandard support.

  • storage_options (dict, optional) –

    Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

  • value_labels (dict of dicts) –

    Dictionary containing columns as keys and dictionaries of column value to labels as values. Labels for a single variable must be 32,000 characters or smaller.

    Added in version 1.4.0.

Raises:
  • NotImplementedError

    • If datetimes contain timezone information * Column dtype is not representable in Stata

  • ValueError

    • Columns listed in convert_dates are neither datetime64[ns] or datetime.datetime * Column listed in convert_dates is not in DataFrame * Categorical label contains more than 32,000 characters

Return type:

None

Return type:

None read_stata

Import Stata data files.

io.stata.StataWriter

Low-level writer for Stata data files.

io.stata.StataWriter117

Low-level writer for version 117 files.

Parameters:
  • path (FilePath | WriteBuffer[bytes])

  • convert_dates (dict[Hashable, str] | None)

  • write_index (bool)

  • byteorder (ToStataByteorder | None)

  • time_stamp (datetime.datetime | None)

  • data_label (str | None)

  • variable_labels (dict[Hashable, str] | None)

  • version (int | None)

  • convert_strl (Sequence[Hashable] | None)

  • compression (CompressionOptions)

  • storage_options (StorageOptions | None)

  • value_labels (dict[Hashable, dict[float, str]] | None)

Examples

>>> df = pd.DataFrame({'animal': ['falcon', 'parrot', 'falcon',
...                               'parrot'],
...                    'speed': [350, 18, 361, 15]})
>>> df.to_stata('animals.dta')
to_string(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, max_cols=None, show_dimensions=False, decimal='.', line_width=None, min_rows=None, max_colwidth=None, encoding=None)

Render a DataFrame to a console-friendly tabular output.

Parameters:
  • buf (str, Path or StringIO-like, optional, default None) – Buffer to write to. If None, the output is returned as a string.

  • columns (array-like, optional, default None) – The subset of columns to write. Writes all columns by default.

  • col_space (int, list or dict of int, optional) – The minimum width of each column. If a list of ints is given every integers corresponds with one column. If a dict is given, the key references the column, while the value defines the space to use..

  • header (bool or list of str, optional) – Write out the column names. If a list of columns is given, it is assumed to be aliases for the column names.

  • index (bool, optional, default True) – Whether to print index (row) labels.

  • na_rep (str, optional, default 'NaN') – String representation of NaN to use.

  • formatters (list, tuple or dict of one-param. functions, optional) – Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.

  • float_format (one-parameter function, optional, default None) – Formatter function to apply to columns’ elements if they are floats. This function must return a unicode string and will be applied only to the non-NaN elements, with NaN being handled by na_rep.

  • sparsify (bool, optional, default True) – Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.

  • index_names (bool, optional, default True) – Prints the names of the indexes.

  • justify (str, default None) –

    How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box. Valid values are

    • left

    • right

    • center

    • justify

    • justify-all

    • start

    • end

    • inherit

    • match-parent

    • initial

    • unset.

  • max_rows (int, optional) – Maximum number of rows to display in the console.

  • max_cols (int, optional) – Maximum number of columns to display in the console.

  • show_dimensions (bool, default False) – Display DataFrame dimensions (number of rows by number of columns).

  • decimal (str, default '.') – Character recognized as decimal separator, e.g. ‘,’ in Europe.

  • line_width (int, optional) – Width to wrap a line in characters.

  • min_rows (int, optional) – The number of rows to display in the console in a truncated repr (when number of rows is above max_rows).

  • max_colwidth (int, optional) – Max width to truncate each column in characters. By default, no limit.

  • encoding (str, default "utf-8") – Set character encoding.

Returns:

If buf is None, returns the result as a string. Otherwise returns None.

Return type:

str or None

See also

to_html

Convert DataFrame to HTML.

Examples

>>> d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
>>> df = pd.DataFrame(d)
>>> print(df.to_string())
   col1  col2
0     1     4
1     2     5
2     3     6
to_timestamp(freq=None, how='start', axis=0, copy=None)

Cast to DatetimeIndex of timestamps, at beginning of period.

Parameters:
  • freq (str, default frequency of PeriodIndex) – Desired frequency.

  • how ({'s', 'e', 'start', 'end'}) – Convention for converting period to timestamp; start of period vs. end.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to convert (the index by default).

  • copy (bool, default True) –

    If False then underlying input data is not copied.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

The DataFrame has a DatetimeIndex.

Return type:

DataFrame

Examples

>>> idx = pd.PeriodIndex(['2023', '2024'], freq='Y')
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d, index=idx)
>>> df1
      col1   col2
2023     1      3
2024     2      4

The resulting timestamps will be at the beginning of the year in this case

>>> df1 = df1.to_timestamp()
>>> df1
            col1   col2
2023-01-01     1      3
2024-01-01     2      4
>>> df1.index
DatetimeIndex(['2023-01-01', '2024-01-01'], dtype='datetime64[ns]', freq=None)

Using freq which is the offset that the Timestamps will have

>>> df2 = pd.DataFrame(data=d, index=idx)
>>> df2 = df2.to_timestamp(freq='M')
>>> df2
            col1   col2
2023-01-31     1      3
2024-01-31     2      4
>>> df2.index
DatetimeIndex(['2023-01-31', '2024-01-31'], dtype='datetime64[ns]', freq=None)
to_wkb(hex=False, **kwargs)[source]

Encode all geometry columns in the GeoDataFrame to WKB.

Parameters:
  • hex (bool) – If true, export the WKB as a hexadecimal string. The default is to return a binary bytes object.

  • kwargs – Additional keyword args will be passed to shapely.to_wkb().

Returns:

geometry columns are encoded to WKB

Return type:

DataFrame

to_wkt(**kwargs)[source]

Encode all geometry columns in the GeoDataFrame to WKT.

Parameters:

kwargs – Keyword args will be passed to shapely.to_wkt().

Returns:

geometry columns are encoded to WKT

Return type:

DataFrame

to_xarray()

Return an xarray object from the pandas object.

Returns:

Data in the pandas structure converted to Dataset if the object is a DataFrame, or a DataArray if the object is a Series.

Return type:

xarray.DataArray or xarray.Dataset

See also

DataFrame.to_hdf

Write DataFrame to an HDF5 file.

DataFrame.to_parquet

Write a DataFrame to the binary parquet format.

Notes

See the xarray docs

Examples

>>> df = pd.DataFrame([('falcon', 'bird', 389.0, 2),
...                    ('parrot', 'bird', 24.0, 2),
...                    ('lion', 'mammal', 80.5, 4),
...                    ('monkey', 'mammal', np.nan, 4)],
...                   columns=['name', 'class', 'max_speed',
...                            'num_legs'])
>>> df
     name   class  max_speed  num_legs
0  falcon    bird      389.0         2
1  parrot    bird       24.0         2
2    lion  mammal       80.5         4
3  monkey  mammal        NaN         4
>>> df.to_xarray()
<xarray.Dataset>
Dimensions:    (index: 4)
Coordinates:
  * index      (index) int64 32B 0 1 2 3
Data variables:
    name       (index) object 32B 'falcon' 'parrot' 'lion' 'monkey'
    class      (index) object 32B 'bird' 'bird' 'mammal' 'mammal'
    max_speed  (index) float64 32B 389.0 24.0 80.5 nan
    num_legs   (index) int64 32B 2 2 4 4
>>> df['max_speed'].to_xarray()
<xarray.DataArray 'max_speed' (index: 4)>
array([389. ,  24. ,  80.5,   nan])
Coordinates:
  * index    (index) int64 0 1 2 3
>>> dates = pd.to_datetime(['2018-01-01', '2018-01-01',
...                         '2018-01-02', '2018-01-02'])
>>> df_multiindex = pd.DataFrame({'date': dates,
...                               'animal': ['falcon', 'parrot',
...                                          'falcon', 'parrot'],
...                               'speed': [350, 18, 361, 15]})
>>> df_multiindex = df_multiindex.set_index(['date', 'animal'])
>>> df_multiindex
                   speed
date       animal
2018-01-01 falcon    350
           parrot     18
2018-01-02 falcon    361
           parrot     15
>>> df_multiindex.to_xarray()
<xarray.Dataset>
Dimensions:  (date: 2, animal: 2)
Coordinates:
  * date     (date) datetime64[ns] 2018-01-01 2018-01-02
  * animal   (animal) object 'falcon' 'parrot'
Data variables:
    speed    (date, animal) int64 350 18 361 15
to_xml(path_or_buffer=None, index=True, root_name='data', row_name='row', na_rep=None, attr_cols=None, elem_cols=None, namespaces=None, prefix=None, encoding='utf-8', xml_declaration=True, pretty_print=True, parser='lxml', stylesheet=None, compression='infer', storage_options=None)

Render a DataFrame to an XML document.

Added in version 1.3.0.

Parameters:
  • path_or_buffer (str, path object, file-like object, or None, default None) – String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string.

  • index (bool, default True) – Whether to include index in XML document.

  • root_name (str, default 'data') – The name of root element in XML document.

  • row_name (str, default 'row') – The name of row element in XML document.

  • na_rep (str, optional) – Missing data representation.

  • attr_cols (list-like, optional) – List of columns to write as attributes in row element. Hierarchical columns will be flattened with underscore delimiting the different levels.

  • elem_cols (list-like, optional) – List of columns to write as children in row element. By default, all columns output as children of row element. Hierarchical columns will be flattened with underscore delimiting the different levels.

  • namespaces (dict, optional) –

    All namespaces to be defined in root element. Keys of dict should be prefix names and values of dict corresponding URIs. Default namespaces should be given empty string key. For example,

    namespaces = {"": "https://example.com"}
    

  • prefix (str, optional) – Namespace prefix to be used for every element and/or attribute in document. This should be one of the keys in namespaces dict.

  • encoding (str, default 'utf-8') – Encoding of the resulting document.

  • xml_declaration (bool, default True) – Whether to include the XML declaration at start of document.

  • pretty_print (bool, default True) – Whether output should be pretty printed with indentation and line breaks.

  • parser ({'lxml','etree'}, default 'lxml') – Parser module to use for building of tree. Only ‘lxml’ and ‘etree’ are supported. With ‘lxml’, the ability to use XSLT stylesheet is supported.

  • stylesheet (str, path object or file-like object, optional) – A URL, file-like object, or a raw string containing an XSLT script used to transform the raw XML output. Script should use layout of elements and attributes from original output. This argument requires lxml to be installed. Only XSLT 1.0 scripts and not later versions is currently supported.

  • compression (str or dict, default 'infer') –

    For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buffer’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.

    Added in version 1.5.0: Added support for .tar files.

    Changed in version 1.4.0: Zstandard support.

  • storage_options (dict, optional) –

    Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

Returns:

If io is None, returns the resulting XML format as a string. Otherwise returns None.

Return type:

None or str

See also

to_json

Convert the pandas object to a JSON string.

to_html

Convert DataFrame to a html.

Examples

>>> df = pd.DataFrame({'shape': ['square', 'circle', 'triangle'],
...                    'degrees': [360, 360, 180],
...                    'sides': [4, np.nan, 3]})
>>> df.to_xml()
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <shape>square</shape>
    <degrees>360</degrees>
    <sides>4.0</sides>
  </row>
  <row>
    <index>1</index>
    <shape>circle</shape>
    <degrees>360</degrees>
    <sides/>
  </row>
  <row>
    <index>2</index>
    <shape>triangle</shape>
    <degrees>180</degrees>
    <sides>3.0</sides>
  </row>
</data>
>>> df.to_xml(attr_cols=[
...           'index', 'shape', 'degrees', 'sides'
...           ])
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row index="0" shape="square" degrees="360" sides="4.0"/>
  <row index="1" shape="circle" degrees="360"/>
  <row index="2" shape="triangle" degrees="180" sides="3.0"/>
</data>
>>> df.to_xml(namespaces={"doc": "https://example.com"},
...           prefix="doc")
<?xml version='1.0' encoding='utf-8'?>
<doc:data xmlns:doc="https://example.com">
  <doc:row>
    <doc:index>0</doc:index>
    <doc:shape>square</doc:shape>
    <doc:degrees>360</doc:degrees>
    <doc:sides>4.0</doc:sides>
  </doc:row>
  <doc:row>
    <doc:index>1</doc:index>
    <doc:shape>circle</doc:shape>
    <doc:degrees>360</doc:degrees>
    <doc:sides/>
  </doc:row>
  <doc:row>
    <doc:index>2</doc:index>
    <doc:shape>triangle</doc:shape>
    <doc:degrees>180</doc:degrees>
    <doc:sides>3.0</doc:sides>
  </doc:row>
</doc:data>
property total_bounds

Return a tuple containing minx, miny, maxx, maxy values for the bounds of the series as a whole.

See GeoSeries.bounds for the bounds of the geometries contained in the series.

Examples

>>> from shapely.geometry import Point, Polygon, LineString
>>> d = {'geometry': [Point(3, -1), Polygon([(0, 0), (1, 1), (1, 0)]),
... LineString([(0, 1), (1, 2)])]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf.total_bounds
array([ 0., -1.,  3.,  2.])
touches(other, align=None)

Return a Series of dtype('bool') with value True for each aligned geometry that touches other.

An object is said to touch other if it has at least one point in common with other and its interior does not intersect with any part of the other. Overlapping features therefore do not touch.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if is touched.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, MultiPoint, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         MultiPoint([(0, 0), (0, 1)]),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (-2, 0), (0, -2)]),
...         LineString([(0, 1), (1, 1)]),
...         LineString([(1, 1), (3, 0)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3         MULTIPOINT ((0 0), (0 1))
dtype: geometry
>>> s2
1    POLYGON ((0 0, -2 0, 0 -2, 0 0))
2               LINESTRING (0 1, 1 1)
3               LINESTRING (1 1, 3 0)
4                         POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries touches a single geometry:

../../../_static/binary_op-03.svg
>>> line = LineString([(0, 0), (-1, -2)])
>>> s.touches(line)
0    True
1    True
2    True
3    True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.touches(s2, align=True)
0    False
1     True
2     True
3    False
4    False
dtype: bool
>>> s.touches(s2, align=False)
0     True
1    False
2     True
3    False
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries touches any element of the other one.

See also

GeoSeries.overlaps, GeoSeries.intersects

transform(transformation, include_z=False)

Return a GeoSeries with the transformation function applied to the geometry coordinates.

Parameters:
  • transformation (Callable) – A function that transforms a (N, 2) or (N, 3) ndarray of float64 to another (N,2) or (N, 3) ndarray of float64

  • include_z (bool, default False) – If True include the third dimension in the coordinates array that is passed to the transformation function. If a geometry has no third dimension, the z-coordinates passed to the function will be NaN.

Return type:

GeoSeries

Examples

>>> from shapely import Point, Polygon
>>> s = geopandas.GeoSeries([Point(0, 0)])
>>> s.transform(lambda x: x + 1)
0    POINT (1 1)
dtype: geometry
>>> s = geopandas.GeoSeries([Polygon([(0, 0), (1, 1), (0, 1)])])
>>> s.transform(lambda x: x * [2, 3])
0    POLYGON ((0 0, 2 3, 0 3, 0 0))
dtype: geometry

By default the third dimension is ignored and you need explicitly include it:

>>> s = geopandas.GeoSeries([Point(0, 0, 0)])
>>> s.transform(lambda x: x + 1, include_z=True)
0    POINT Z (1 1 1)
dtype: geometry
translate(xoff=0.0, yoff=0.0, zoff=0.0)

Return a GeoSeries with translated geometries.

See http://shapely.readthedocs.io/en/latest/manual.html#shapely.affinity.translate for details.

Parameters:
  • xoff (float, float, float) – Amount of offset along each dimension. xoff, yoff, and zoff for translation along the x, y, and z dimensions respectively.

  • yoff (float, float, float) – Amount of offset along each dimension. xoff, yoff, and zoff for translation along the x, y, and z dimensions respectively.

  • zoff (float, float, float) – Amount of offset along each dimension. xoff, yoff, and zoff for translation along the x, y, and z dimensions respectively.

Examples

>>> from shapely.geometry import Point, LineString, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Point(1, 1),
...         LineString([(1, -1), (1, 0)]),
...         Polygon([(3, -1), (4, 0), (3, 1)]),
...     ]
... )
>>> s
0                         POINT (1 1)
1              LINESTRING (1 -1, 1 0)
2    POLYGON ((3 -1, 4 0, 3 1, 3 -1))
dtype: geometry
>>> s.translate(2, 3)
0                       POINT (3 4)
1             LINESTRING (3 2, 3 3)
2    POLYGON ((5 2, 6 3, 5 4, 5 2))
dtype: geometry
transpose(*args, copy=False)

Transpose index and columns.

Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().

Parameters:
  • *args (tuple, optional) – Accepted for compatibility with NumPy.

  • copy (bool, default False) –

    Whether to copy the data after transposing, even for DataFrames with a single dtype.

    Note that a copy is always required for mixed dtype DataFrames, or for DataFrames with any extension types.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

The transposed DataFrame.

Return type:

DataFrame

See also

numpy.transpose

Permute the dimensions of a given array.

Notes

Transposing a DataFrame with mixed dtypes will result in a homogeneous DataFrame with the object dtype. In such a case, a copy of the data is always made.

Examples

Square DataFrame with homogeneous dtype

>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d1)
>>> df1
   col1  col2
0     1     3
1     2     4
>>> df1_transposed = df1.T  # or df1.transpose()
>>> df1_transposed
      0  1
col1  1  2
col2  3  4

When the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype:

>>> df1.dtypes
col1    int64
col2    int64
dtype: object
>>> df1_transposed.dtypes
0    int64
1    int64
dtype: object

Non-square DataFrame with mixed dtypes

>>> d2 = {'name': ['Alice', 'Bob'],
...       'score': [9.5, 8],
...       'employed': [False, True],
...       'kids': [0, 0]}
>>> df2 = pd.DataFrame(data=d2)
>>> df2
    name  score  employed  kids
0  Alice    9.5     False     0
1    Bob    8.0      True     0
>>> df2_transposed = df2.T  # or df2.transpose()
>>> df2_transposed
              0     1
name      Alice   Bob
score       9.5   8.0
employed  False  True
kids          0     0

When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype:

>>> df2.dtypes
name         object
score       float64
employed       bool
kids          int64
dtype: object
>>> df2_transposed.dtypes
0    object
1    object
dtype: object
truediv(other, axis='columns', level=None, fill_value=None)

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:
  • other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.

  • axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

  • level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

  • fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

DataFrame

See also

DataFrame.add

Add DataFrames.

DataFrame.sub

Subtract DataFrames.

DataFrame.mul

Multiply DataFrames.

DataFrame.div

Divide DataFrames (float division).

DataFrame.truediv

Divide DataFrames (float division).

DataFrame.floordiv

Divide DataFrames (integer division).

DataFrame.mod

Calculate modulo (remainder after division).

DataFrame.pow

Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4
>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
truncate(before=None, after=None, axis=None, copy=None)

Truncate a Series or DataFrame before and after some index value.

This is a useful shorthand for boolean indexing based on index values above or below certain thresholds.

Parameters:
  • before (date, str, int) – Truncate all rows before this index value.

  • after (date, str, int) – Truncate all rows after this index value.

  • axis ({0 or 'index', 1 or 'columns'}, optional) – Axis to truncate. Truncates the index (rows) by default. For Series this parameter is unused and defaults to 0.

  • copy (bool, default is True,) –

    Return a copy of the truncated section.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

The truncated Series or DataFrame.

Return type:

type of caller

See also

DataFrame.loc

Select a subset of a DataFrame by label.

DataFrame.iloc

Select a subset of a DataFrame by position.

Notes

If the index being truncated contains only datetime values, before and after may be specified as strings instead of Timestamps.

Examples

>>> df = pd.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
...                    'B': ['f', 'g', 'h', 'i', 'j'],
...                    'C': ['k', 'l', 'm', 'n', 'o']},
...                   index=[1, 2, 3, 4, 5])
>>> df
   A  B  C
1  a  f  k
2  b  g  l
3  c  h  m
4  d  i  n
5  e  j  o
>>> df.truncate(before=2, after=4)
   A  B  C
2  b  g  l
3  c  h  m
4  d  i  n

The columns of a DataFrame can be truncated.

>>> df.truncate(before="A", after="B", axis="columns")
   A  B
1  a  f
2  b  g
3  c  h
4  d  i
5  e  j

For Series, only rows can be truncated.

>>> df['A'].truncate(before=2, after=4)
2    b
3    c
4    d
Name: A, dtype: object

The index values in truncate can be datetimes or string dates.

>>> dates = pd.date_range('2016-01-01', '2016-02-01', freq='s')
>>> df = pd.DataFrame(index=dates, data={'A': 1})
>>> df.tail()
                     A
2016-01-31 23:59:56  1
2016-01-31 23:59:57  1
2016-01-31 23:59:58  1
2016-01-31 23:59:59  1
2016-02-01 00:00:00  1
>>> df.truncate(before=pd.Timestamp('2016-01-05'),
...             after=pd.Timestamp('2016-01-10')).tail()
                     A
2016-01-09 23:59:56  1
2016-01-09 23:59:57  1
2016-01-09 23:59:58  1
2016-01-09 23:59:59  1
2016-01-10 00:00:00  1

Because the index is a DatetimeIndex containing only dates, we can specify before and after as strings. They will be coerced to Timestamps before truncation.

>>> df.truncate('2016-01-05', '2016-01-10').tail()
                     A
2016-01-09 23:59:56  1
2016-01-09 23:59:57  1
2016-01-09 23:59:58  1
2016-01-09 23:59:59  1
2016-01-10 00:00:00  1

Note that truncate assumes a 0 value for any unspecified time component (midnight). This differs from partial string slicing, which returns any partially matching dates.

>>> df.loc['2016-01-05':'2016-01-10', :].tail()
                     A
2016-01-10 23:59:55  1
2016-01-10 23:59:56  1
2016-01-10 23:59:57  1
2016-01-10 23:59:58  1
2016-01-10 23:59:59  1
property type

Return the geometry type of each geometry in the GeoSeries.

tz_convert(tz, axis=0, level=None, copy=None)

Convert tz-aware axis to target time zone.

Parameters:
  • tz (str or tzinfo object or None) – Target time zone. Passing None will convert to UTC and remove the timezone information.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to convert

  • level (int, str, default None) – If axis is a MultiIndex, convert a specific level. Otherwise must be None.

  • copy (bool, default True) –

    Also make a copy of the underlying data.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

Object with time zone converted axis.

Return type:

Series/DataFrame

Raises:

TypeError – If the axis is tz-naive.

Examples

Change to another time zone:

>>> s = pd.Series(
...     [1],
...     index=pd.DatetimeIndex(['2018-09-15 01:30:00+02:00']),
... )
>>> s.tz_convert('Asia/Shanghai')
2018-09-15 07:30:00+08:00    1
dtype: int64

Pass None to convert to UTC and get a tz-naive index:

>>> s = pd.Series([1],
...               index=pd.DatetimeIndex(['2018-09-15 01:30:00+02:00']))
>>> s.tz_convert(None)
2018-09-14 23:30:00    1
dtype: int64
tz_localize(tz, axis=0, level=None, copy=None, ambiguous='raise', nonexistent='raise')

Localize tz-naive index of a Series or DataFrame to target time zone.

This operation localizes the Index. To localize the values in a timezone-naive Series, use Series.dt.tz_localize().

Parameters:
  • tz (str or tzinfo or None) – Time zone to localize. Passing None will remove the time zone information and preserve local time.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to localize

  • level (int, str, default None) – If axis ia a MultiIndex, localize a specific level. Otherwise must be None.

  • copy (bool, default True) –

    Also make a copy of the underlying data.

    Note

    The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

    You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

  • ambiguous ('infer', bool-ndarray, 'NaT', default 'raise') –

    When clocks moved backward due to DST, ambiguous times may arise. For example in Central European Time (UTC+01), when going from 03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at 00:30:00 UTC and at 01:30:00 UTC. In such a situation, the ambiguous parameter dictates how ambiguous times should be handled.

    • ’infer’ will attempt to infer fall dst-transition hours based on order

    • bool-ndarray where True signifies a DST time, False designates a non-DST time (note that this flag is only applicable for ambiguous times)

    • ’NaT’ will return NaT where there are ambiguous times

    • ’raise’ will raise an AmbiguousTimeError if there are ambiguous times.

  • nonexistent (str, default 'raise') –

    A nonexistent time does not exist in a particular timezone where clocks moved forward due to DST. Valid values are:

    • ’shift_forward’ will shift the nonexistent time forward to the closest existing time

    • ’shift_backward’ will shift the nonexistent time backward to the closest existing time

    • ’NaT’ will return NaT where there are nonexistent times

    • timedelta objects will shift nonexistent times by the timedelta

    • ’raise’ will raise an NonExistentTimeError if there are nonexistent times.

Returns:

Same type as the input.

Return type:

Series/DataFrame

Raises:

TypeError – If the TimeSeries is tz-aware and tz is not None.

Examples

Localize local times:

>>> s = pd.Series(
...     [1],
...     index=pd.DatetimeIndex(['2018-09-15 01:30:00']),
... )
>>> s.tz_localize('CET')
2018-09-15 01:30:00+02:00    1
dtype: int64

Pass None to convert to tz-naive index and preserve local time:

>>> s = pd.Series([1],
...               index=pd.DatetimeIndex(['2018-09-15 01:30:00+02:00']))
>>> s.tz_localize(None)
2018-09-15 01:30:00    1
dtype: int64

Be careful with DST changes. When there is sequential data, pandas can infer the DST time:

>>> s = pd.Series(range(7),
...               index=pd.DatetimeIndex(['2018-10-28 01:30:00',
...                                       '2018-10-28 02:00:00',
...                                       '2018-10-28 02:30:00',
...                                       '2018-10-28 02:00:00',
...                                       '2018-10-28 02:30:00',
...                                       '2018-10-28 03:00:00',
...                                       '2018-10-28 03:30:00']))
>>> s.tz_localize('CET', ambiguous='infer')
2018-10-28 01:30:00+02:00    0
2018-10-28 02:00:00+02:00    1
2018-10-28 02:30:00+02:00    2
2018-10-28 02:00:00+01:00    3
2018-10-28 02:30:00+01:00    4
2018-10-28 03:00:00+01:00    5
2018-10-28 03:30:00+01:00    6
dtype: int64

In some cases, inferring the DST is impossible. In such cases, you can pass an ndarray to the ambiguous parameter to set the DST explicitly

>>> s = pd.Series(range(3),
...               index=pd.DatetimeIndex(['2018-10-28 01:20:00',
...                                       '2018-10-28 02:36:00',
...                                       '2018-10-28 03:46:00']))
>>> s.tz_localize('CET', ambiguous=np.array([True, True, False]))
2018-10-28 01:20:00+02:00    0
2018-10-28 02:36:00+02:00    1
2018-10-28 03:46:00+01:00    2
dtype: int64

If the DST transition causes nonexistent times, you can shift these dates forward or backward with a timedelta object or ‘shift_forward’ or ‘shift_backward’.

>>> s = pd.Series(range(2),
...               index=pd.DatetimeIndex(['2015-03-29 02:30:00',
...                                       '2015-03-29 03:30:00']))
>>> s.tz_localize('Europe/Warsaw', nonexistent='shift_forward')
2015-03-29 03:00:00+02:00    0
2015-03-29 03:30:00+02:00    1
dtype: int64
>>> s.tz_localize('Europe/Warsaw', nonexistent='shift_backward')
2015-03-29 01:59:59.999999999+01:00    0
2015-03-29 03:30:00+02:00              1
dtype: int64
>>> s.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta('1h'))
2015-03-29 03:30:00+02:00    0
2015-03-29 03:30:00+02:00    1
dtype: int64
property unary_union

Return a geometry containing the union of all geometries in the GeoSeries.

The unary_union attribute is deprecated. Use union_all() instead.

Examples

>>> from shapely.geometry import box
>>> s = geopandas.GeoSeries([box(0,0,1,1), box(0,0,2,2)])
>>> s
0    POLYGON ((1 0, 1 1, 0 1, 0 0, 1 0))
1    POLYGON ((2 0, 2 2, 0 2, 0 0, 2 0))
dtype: geometry
>>> union = s.unary_union
>>> print(union)
POLYGON ((0 1, 0 2, 2 2, 2 0, 1 0, 0 0, 0 1))

See also

GeoSeries.union_all

union(other, align=None)

Return a GeoSeries of the union of points in each aligned geometry with other.

../../../_static/binary_geo-union.svg

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the union with.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 6),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry
>>>
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

We can do union of each geometry and a single shapely geometry:

../../../_static/binary_op-03.svg
>>> s.union(Polygon([(0, 0), (1, 1), (0, 1)]))
0             POLYGON ((0 0, 0 1, 0 2, 2 2, 1 1, 0 0))
1             POLYGON ((0 0, 0 1, 0 2, 2 2, 1 1, 0 0))
2    GEOMETRYCOLLECTION (POLYGON ((0 0, 0 1, 1 1, 0...
3    GEOMETRYCOLLECTION (POLYGON ((0 0, 0 1, 1 1, 0...
4                       POLYGON ((0 1, 1 1, 0 0, 0 1))
dtype: geometry

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s.union(s2, align=True)
0                                                 None
1             POLYGON ((0 0, 0 1, 0 2, 2 2, 1 1, 0 0))
2    MULTILINESTRING ((0 0, 1 1), (1 1, 2 2), (1 0,...
3                                LINESTRING (2 0, 0 2)
4                            MULTIPOINT ((0 1), (1 1))
5                                                 None
dtype: geometry
>>> s.union(s2, align=False)
0             POLYGON ((0 0, 0 1, 0 2, 2 2, 1 1, 0 0))
1    GEOMETRYCOLLECTION (POLYGON ((0 0, 0 2, 1 2, 2...
2    MULTILINESTRING ((0 0, 1 1), (1 1, 2 2), (2 0,...
3                                LINESTRING (2 0, 0 2)
4                                          POINT (0 1)
dtype: geometry

See also

GeoSeries.symmetric_difference, GeoSeries.difference, GeoSeries.intersection

union_all(method='unary', *, grid_size=None)

Return a geometry containing the union of all geometries in the GeoSeries.

By default, the unary union algorithm is used. If the geometries are non-overlapping (forming a coverage), GeoPandas can use a significantly faster algorithm to perform the union using the method="coverage" option. Alternatively, for situations which can be divided into many disjoint subsets, method="disjoint_subset" may be preferable.

Parameters:
  • method (str (default ````”unary”:py:class:`)`) –

    The method to use for the union. Options are:

    • "unary": use the unary union algorithm. This option is the most robust but can be slow for large numbers of geometries (default).

    • "coverage": use the coverage union algorithm. This option is optimized for non-overlapping polygons and can be significantly faster than the unary union algorithm. However, it can produce invalid geometries if the polygons overlap.

    • "disjoint_subset:: use the disjoint subset union algorithm. This option is optimized for inputs that can be divided into subsets that do not intersect. If there is only one such subset, performance can be expected to be worse than "unary". Requires Shapely >= 2.1.

  • grid_size (float, default None) –

    When grid size is specified, a fixed-precision space is used to perform the union operations. This can be useful when unioning geometries that are not perfectly snapped or to avoid geometries not being unioned because of robustness issues. The inputs are first snapped to a grid of the given size. When a line segment of a geometry is within tolerance off a vertex of another geometry, this vertex will be inserted in the line segment. Finally, the result vertices are computed on the same grid. Is only supported for method "unary". If None, the highest precision of the inputs will be used. Defaults to None.

    Added in version 1.1.0.

Examples

>>> from shapely.geometry import box
>>> s = geopandas.GeoSeries([box(0, 0, 1, 1), box(0, 0, 2, 2)])
>>> s
0    POLYGON ((1 0, 1 1, 0 1, 0 0, 1 0))
1    POLYGON ((2 0, 2 2, 0 2, 0 0, 2 0))
dtype: geometry
>>> s.union_all()
<POLYGON ((0 1, 0 2, 2 2, 2 0, 1 0, 0 0, 0 1))>
unstack(level=-1, fill_value=None, sort=True)

Pivot a level of the (necessarily hierarchical) index labels.

Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex).

Parameters:
  • level (int, str, or list of these, default -1 (last level)) – Level(s) of index to unstack, can pass level name.

  • fill_value (int, str or dict) – Replace NaN with this value if the unstack produces missing values.

  • sort (bool, default True) – Sort the level(s) in the resulting MultiIndex columns.

Return type:

Series or DataFrame

See also

DataFrame.pivot

Pivot a table based on column values.

DataFrame.stack

Pivot a level of the column labels (inverse operation from unstack).

Notes

Reference the user guide for more examples.

Examples

>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
...                                    ('two', 'a'), ('two', 'b')])
>>> s = pd.Series(np.arange(1.0, 5.0), index=index)
>>> s
one  a   1.0
     b   2.0
two  a   3.0
     b   4.0
dtype: float64
>>> s.unstack(level=-1)
     a   b
one  1.0  2.0
two  3.0  4.0
>>> s.unstack(level=0)
   one  two
a  1.0   3.0
b  2.0   4.0
>>> df = s.unstack(level=0)
>>> df.unstack()
one  a  1.0
     b  2.0
two  a  3.0
     b  4.0
dtype: float64
update(other, join='left', overwrite=True, filter_func=None, errors='ignore')

Modify in place using non-NA values from another DataFrame.

Aligns on indices. There is no return value.

Parameters:
  • other (DataFrame, or object coercible into a DataFrame) – Should have at least one matching index/column label with the original DataFrame. If a Series is passed, its name attribute must be set, and that will be used as the column name to align with the original DataFrame.

  • join ({'left'}, default 'left') – Only left join is implemented, keeping the index and columns of the original object.

  • overwrite (bool, default True) –

    How to handle non-NA values for overlapping keys:

    • True: overwrite original DataFrame’s values with values from other.

    • False: only update values that are NA in the original DataFrame.

  • filter_func (callable(1d-array) -> bool 1d-array, optional) – Can choose to replace values other than NA. Return True for values that should be updated.

  • errors ({'raise', 'ignore'}, default 'ignore') – If ‘raise’, will raise a ValueError if the DataFrame and other both contain non-NA data in the same place.

Returns:

This method directly changes calling object.

Return type:

None

Raises:
  • ValueError

    • When errors=’raise’ and there’s overlapping non-NA data. * When errors is not either ‘ignore’ or ‘raise’

  • NotImplementedError

    • If join != ‘left’

See also

dict.update

Similar method for dictionaries.

DataFrame.merge

For column(s)-on-column(s) operations.

Examples

>>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, 5, 6],
...                        'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
   A  B
0  1  4
1  2  5
2  3  6

The DataFrame’s length does not increase as a result of the update, only values at matching index/column labels are updated.

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
>>> df.update(new_df)
>>> df
   A  B
0  a  d
1  b  e
2  c  f
>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'f']}, index=[0, 2])
>>> df.update(new_df)
>>> df
   A  B
0  a  d
1  b  y
2  c  f

For Series, its name attribute must be set.

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_column = pd.Series(['d', 'e', 'f'], name='B')
>>> df.update(new_column)
>>> df
   A  B
0  a  d
1  b  e
2  c  f

If other contains NaNs the corresponding values are not updated in the original dataframe.

>>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400., 500., 600.]})
>>> new_df = pd.DataFrame({'B': [4, np.nan, 6]})
>>> df.update(new_df)
>>> df
   A      B
0  1    4.0
1  2  500.0
2  3    6.0
value_counts(subset=None, normalize=False, sort=True, ascending=False, dropna=True)

Return a Series containing the frequency of each distinct row in the Dataframe.

Parameters:
  • subset (label or list of labels, optional) – Columns to use when counting unique combinations.

  • normalize (bool, default False) – Return proportions rather than frequencies.

  • sort (bool, default True) – Sort by frequencies when True. Sort by DataFrame column values when False.

  • ascending (bool, default False) – Sort in ascending order.

  • dropna (bool, default True) –

    Don’t include counts of rows that contain NA values.

    Added in version 1.3.0.

Return type:

Series

See also

Series.value_counts

Equivalent method on Series.

Notes

The returned Series will have a MultiIndex with one level per input column but an Index (non-multi) for a single label. By default, rows that contain any NA values are omitted from the result. By default, the resulting Series will be in descending order so that the first element is the most frequently-occurring row.

Examples

>>> df = pd.DataFrame({'num_legs': [2, 4, 4, 6],
...                    'num_wings': [2, 0, 0, 0]},
...                   index=['falcon', 'dog', 'cat', 'ant'])
>>> df
        num_legs  num_wings
falcon         2          2
dog            4          0
cat            4          0
ant            6          0
>>> df.value_counts()
num_legs  num_wings
4         0            2
2         2            1
6         0            1
Name: count, dtype: int64
>>> df.value_counts(sort=False)
num_legs  num_wings
2         2            1
4         0            2
6         0            1
Name: count, dtype: int64
>>> df.value_counts(ascending=True)
num_legs  num_wings
2         2            1
6         0            1
4         0            2
Name: count, dtype: int64
>>> df.value_counts(normalize=True)
num_legs  num_wings
4         0            0.50
2         2            0.25
6         0            0.25
Name: proportion, dtype: float64

With dropna set to False we can also count rows with NA values.

>>> df = pd.DataFrame({'first_name': ['John', 'Anne', 'John', 'Beth'],
...                    'middle_name': ['Smith', pd.NA, pd.NA, 'Louise']})
>>> df
  first_name middle_name
0       John       Smith
1       Anne        <NA>
2       John        <NA>
3       Beth      Louise
>>> df.value_counts()
first_name  middle_name
Beth        Louise         1
John        Smith          1
Name: count, dtype: int64
>>> df.value_counts(dropna=False)
first_name  middle_name
Anne        NaN            1
Beth        Louise         1
John        Smith          1
            NaN            1
Name: count, dtype: int64
>>> df.value_counts("first_name")
first_name
John    2
Anne    1
Beth    1
Name: count, dtype: int64
property values: ndarray

Return a Numpy representation of the DataFrame.

Warning

We recommend using DataFrame.to_numpy() instead.

Only the values in the DataFrame will be returned, the axes labels will be removed.

Returns:

The values of the DataFrame.

Return type:

numpy.ndarray

See also

DataFrame.to_numpy

Recommended alternative to this method.

DataFrame.index

Retrieve the index labels.

DataFrame.columns

Retrieving the column names.

Notes

The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.

e.g. If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcast to int32. By numpy.find_common_type() convention, mixing int64 and uint64 will result in a float64 dtype.

Examples

A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.

>>> df = pd.DataFrame({'age':    [ 3,  29],
...                    'height': [94, 170],
...                    'weight': [31, 115]})
>>> df
   age  height  weight
0    3      94      31
1   29     170     115
>>> df.dtypes
age       int64
height    int64
weight    int64
dtype: object
>>> df.values
array([[  3,  94,  31],
       [ 29, 170, 115]])

A DataFrame with mixed type columns(e.g., str/object, int64, float32) results in an ndarray of the broadest type that accommodates these mixed types (e.g., object).

>>> df2 = pd.DataFrame([('parrot',   24.0, 'second'),
...                     ('lion',     80.5, 1),
...                     ('monkey', np.nan, None)],
...                   columns=('name', 'max_speed', 'rank'))
>>> df2.dtypes
name          object
max_speed    float64
rank          object
dtype: object
>>> df2.values
array([['parrot', 24.0, 'second'],
       ['lion', 80.5, 1],
       ['monkey', nan, None]], dtype=object)
var(axis=0, skipna=True, ddof=1, numeric_only=False, **kwargs)

Return unbiased variance over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:
  • axis ({index (0), columns (1)}) –

    For Series this parameter is unused and defaults to 0.

    Warning

    The behavior of DataFrame.var with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

  • skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.

  • ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

  • numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

Return type:

Series or DataFrame (if level specified)

Examples

>>> df = pd.DataFrame({'person_id': [0, 1, 2, 3],
...                    'age': [21, 25, 62, 43],
...                    'height': [1.61, 1.87, 1.49, 2.01]}
...                   ).set_index('person_id')
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01
>>> df.var()
age       352.916667
height      0.056367
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.var(ddof=0)
age       264.687500
height      0.042275
dtype: float64
voronoi_polygons(tolerance=0.0, extend_to=None, only_edges=False)

Return a GeoSeries consisting of objects representing the computed Voronoi diagram around the vertices of an input geometry.

All geometries within the GeoSeries are considered together within a single Voronoi diagram. The resulting geometries therefore do not necessarily map 1:1 to input geometries. Note that each vertex of a geometry is considered a site for the Voronoi diagram, so the diagram will be constructed around the vertices of each geometry.

Notes

The order of polygons in the output currently does not correspond to the order of input vertices.

If you want to generate a Voronoi diagram for each geometry separately, use shapely.voronoi_polygons() instead.

Parameters:
  • tolerance (float, default 0.0) – Snap input vertices together if their distance is less than this value.

  • extend_to (shapely.Geometry, default None) – If set, the Voronoi diagram will be extended to cover the envelope of this geometry (unless this envelope is smaller than the input geometry).

  • only_edges (bool (optional, default False)) – If set to True, the diagram will return LineStrings instead of Polygons.

Examples

The most common use case is to generate polygons representing the Voronoi diagram around a set of points:

>>> from shapely import LineString, MultiPoint, Point, Polygon
>>> s = geopandas.GeoSeries(
...     [
...         Point(1, 1),
...         Point(2, 2),
...         Point(1, 3),
...         Point(0, 2),
...     ]
... )
>>> s
0    POINT (1 1)
1    POINT (2 2)
2    POINT (1 3)
3    POINT (0 2)
dtype: geometry

By default, you get back a GeoSeries of polygons:

>>> s.voronoi_polygons()
0        POLYGON ((-2 5, 1 2, -2 -1, -2 5))
1           POLYGON ((4 5, 1 2, -2 5, 4 5))
2       POLYGON ((-2 -1, 1 2, 4 -1, -2 -1))
3    POLYGON ((4 -1, 4 -1, 1 2, 4 5, 4 -1))
dtype: geometry

If you set only_edges to True, you get back LineStrings representing the edges of the Voronoi diagram:

>>> s.voronoi_polygons(only_edges=True)
0     LINESTRING (-2 5, 1 2)
1    LINESTRING (1 2, -2 -1)
2      LINESTRING (4 5, 1 2)
3     LINESTRING (1 2, 4 -1)
dtype: geometry

You can also extend each diagram to a given geometry:

>>> limit = Polygon([(-10, -10), (0, 15), (15, 15), (15, 0)])
>>> s.voronoi_polygons(extend_to=limit)
0              POLYGON ((-10 13, 1 2, -10 -9, -10 13))
1    POLYGON ((15 15, 15 -10, 13 -10, 1 2, 14 15, 1...
2    POLYGON ((-10 -10, -10 -9, 1 2, 13 -10, -10 -10))
3       POLYGON ((-10 15, 14 15, 1 2, -10 13, -10 15))
dtype: geometry

The method supports any geometry type but keep in mind that the underlying algorithm is based on the vertices of the input geometries only and does not consider edge segments between vertices.

>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (2, 1), (1, 2)]),
...         MultiPoint([(2, 3), (2, 0), (3, 1)]),
...     ]
... )
>>> s2
0      POLYGON ((0 0, 1 1, 0 1, 0 0))
1          LINESTRING (1 0, 2 1, 1 2)
2    MULTIPOINT ((2 3), (2 0), (3 1))
dtype: geometry
>>> s2.voronoi_polygons()
0    POLYGON ((1.5 1.5, 1.5 0.5, 0.5 0.5, 0.5 1.5, ...
1    POLYGON ((1.5 0.5, 1.5 1.5, 2 2, 2.5 2, 2.5 0....
2    POLYGON ((-3 -3, -3 0.5, 0.5 0.5, 0.5 -3, -3 -3))
3    POLYGON ((0.5 -3, 0.5 0.5, 1.5 0.5, 1.5 -3, 0....
4     POLYGON ((-3 5, 0.5 1.5, 0.5 0.5, -3 0.5, -3 5))
5    POLYGON ((-3 6, -2 6, 2 2, 1.5 1.5, 0.5 1.5, -...
6    POLYGON ((1.5 -3, 1.5 0.5, 2.5 0.5, 6 -3, 1.5 ...
7       POLYGON ((6 6, 6 3.75, 2.5 2, 2 2, -2 6, 6 6))
8       POLYGON ((6 -3, 2.5 0.5, 2.5 2, 6 3.75, 6 -3))
dtype: geometry

See also

GeoSeries.delaunay_triangles

Delaunay triangulation around vertices

where(cond, other=nan, *, inplace=False, axis=None, level=None)

Replace values where the condition is False.

Parameters:
  • cond (bool Series/DataFrame, array-like, or callable) – Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

  • other (scalar, Series/DataFrame, or callable) – Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it). If not specified, entries will be filled with the corresponding NULL value (np.nan for numpy dtypes, pd.NA for extension dtypes).

  • inplace (bool, default False) – Whether to perform the operation in place on the data.

  • axis (int, default None) – Alignment axis if needed. For Series this parameter is unused and defaults to 0.

  • level (int, default None) – Alignment level if needed.

Return type:

Same type as caller or None if ``inplace=True`.`

See also

DataFrame.mask()

Return an object of same shape as self.

Notes

The where method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is True the element is used; otherwise the corresponding element from the DataFrame other is used. If the axis of other does not align with axis of cond Series/DataFrame, the misaligned index positions will be filled with False.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the where documentation in indexing.

The dtype of the object takes precedence. The fill value is casted to the object’s dtype, if this can be done losslessly.

Examples

>>> s = pd.Series(range(5))
>>> s.where(s > 0)
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64
>>> s.mask(s > 0)
0    0.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
>>> s = pd.Series(range(5))
>>> t = pd.Series([True, False])
>>> s.where(t, 99)
0     0
1    99
2    99
3    99
4    99
dtype: int64
>>> s.mask(t, 99)
0    99
1     1
2    99
3    99
4    99
dtype: int64
>>> s.where(s > 1, 10)
0    10
1    10
2    2
3    3
4    4
dtype: int64
>>> s.mask(s > 1, 10)
0     0
1     1
2    10
3    10
4    10
dtype: int64
>>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9
>>> m = df % 3 == 0
>>> df.where(m, -df)
   A  B
0  0 -1
1 -2  3
2 -4 -5
3  6 -7
4 -8  9
>>> df.where(m, -df) == np.where(m, df, -df)
      A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
>>> df.where(m, -df) == df.mask(~m, -df)
      A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
within(other, align=None)

Return a Series of dtype('bool') with value True for each aligned geometry that is within other.

An object is said to be within other if at least one of its points is located in the interior and no points are located in the exterior of the other. If either object is empty, this operation returns False.

This is the inverse of contains() in the sense that the expression a.within(b) == b.contains(a) always evaluates to True.

The operation works on a 1-to-1 row-wise manner:

../../../_static/binary_op-01.svg
Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if each geometry is within.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (1, 2), (0, 2)]),
...         LineString([(0, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = geopandas.GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (0, 2)]),
...         LineString([(0, 0), (0, 1)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 1 2, 0 2, 0 0))
2             LINESTRING (0 0, 0 2)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (0 0, 0 2)
3             LINESTRING (0 0, 0 1)
4                       POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries is within a single geometry:

../../../_static/binary_op-03.svg
>>> polygon = Polygon([(0, 0), (2, 2), (0, 2)])
>>> s.within(polygon)
0     True
1     True
2    False
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

../../../_static/binary_op-02.svg
>>> s2.within(s)
0    False
1    False
2     True
3    False
4    False
dtype: bool
>>> s2.within(s, align=False)
1     True
2    False
3     True
4     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is within any element of the other one.

See also

GeoSeries.contains

xs(key, axis=0, level=None, drop_level=True)

Return cross-section from the Series/DataFrame.

This method takes a key argument to select data at a particular level of a MultiIndex.

Parameters:
  • key (label or tuple of label) – Label contained in the index, or partially in a MultiIndex.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Axis to retrieve cross-section on.

  • level (object, defaults to first n levels (n=1 or len(key))) – In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.

  • drop_level (bool, default True) – If False, returns object with same levels as self.

Returns:

Cross-section from the original Series or DataFrame corresponding to the selected index levels.

Return type:

Series or DataFrame

See also

DataFrame.loc

Access a group of rows and columns by label(s) or a boolean array.

DataFrame.iloc

Purely integer-location based indexing for selection by position.

Notes

xs can not be used to set values.

MultiIndex Slicers is a generic way to get/set values on any level or levels. It is a superset of xs functionality, see MultiIndex Slicers.

Examples

>>> d = {'num_legs': [4, 4, 2, 2],
...      'num_wings': [0, 0, 2, 2],
...      'class': ['mammal', 'mammal', 'mammal', 'bird'],
...      'animal': ['cat', 'dog', 'bat', 'penguin'],
...      'locomotion': ['walks', 'walks', 'flies', 'walks']}
>>> df = pd.DataFrame(data=d)
>>> df = df.set_index(['class', 'animal', 'locomotion'])
>>> df
                           num_legs  num_wings
class  animal  locomotion
mammal cat     walks              4          0
       dog     walks              4          0
       bat     flies              2          2
bird   penguin walks              2          2

Get values at specified index

>>> df.xs('mammal')
                   num_legs  num_wings
animal locomotion
cat    walks              4          0
dog    walks              4          0
bat    flies              2          2

Get values at several indexes

>>> df.xs(('mammal', 'dog', 'walks'))
num_legs     4
num_wings    0
Name: (mammal, dog, walks), dtype: int64

Get values at specified index and level

>>> df.xs('cat', level=1)
                   num_legs  num_wings
class  locomotion
mammal walks              4          0

Get values at several indexes and levels

>>> df.xs(('bird', 'walks'),
...       level=[0, 'locomotion'])
         num_legs  num_wings
animal
penguin         2          2

Get values at specified column and axis

>>> df.xs('num_wings', axis=1)
class   animal   locomotion
mammal  cat      walks         0
        dog      walks         0
        bat      flies         2
bird    penguin  walks         2
Name: num_wings, dtype: int64
class pyorps.raster.rasterizer.GeoDataset(file_source, crs=None)[source]

Bases: ABC

Parameters:
  • file_source (Any)

  • crs (str | None)

__init__(file_source, crs=None)[source]
Parameters:
  • file_source (Any)

  • crs (str | None)

crs: Optional[str] = None
data: Union[GeoDataFrame, ndarray, None] = None
file_source: Any
abstractmethod load_data(**kwargs)[source]
class pyorps.raster.rasterizer.GeoRasterizer(input_data, cost_assumptions, bbox=None, mask=None, default_crs=None, **kwargs)[source]

Bases: object

A class for preparing and rasterizing geospatial data with cost assumptions.

This class integrates:
  • GeoDataset for representing datasets with metadata

  • CostAssumptions for handling cost mappings

  • Rasterization functionality for converting vector data to rasters

Parameters:
__init__(input_data, cost_assumptions, bbox=None, mask=None, default_crs=None, **kwargs)[source]

Initialize the GeoRasterizer with a base dataset and optional parameters.

Parameters:
property base_data: GeoDataFrame

Property to directly access the data attribute of the base_dataset.

Returns:

The base dataset (GeoDataFrame

clip_to_area(clip_geometry)[source]

Clip the base dataset to a specific area.

Parameters:

clip_geometry (Union[GeoDataFrame, Polygon]) – The geometry to clip by

Return type:

GeoDataset

Returns:

The clipped base dataset

create_bounds_geodataframe(target_crs=None)[source]

Creates a GeoDataFrame from the bounds of the base data in a specified CRS.

Parameters:

target_crs (Optional[str]) – The desired CRS for the new GeoDataFrame

Return type:

GeoDataFrame

Returns:

A new GeoDataFrame containing the bounds of the base data

static create_buffer(dataset, geometry_buffer_m, inplace=True)[source]

Add a buffer to geometries in a dataset.

Parameters:
  • dataset (Union[VectorDataset, GeoDataFrame]) – The dataset to buffer (GeoDataset or GeoDataFrame)

  • geometry_buffer_m (float) – Distance to buffer in dataset’s CRS units

  • inplace (bool) – If True, modify the dataset in place

Return type:

Union[VectorDataset, GeoDataFrame]

Returns:

The buffered dataset

property crs

Passing crs property of base_dataset.

Returns:

The desired CRS of the base dataset

modify_raster_from_dataset(input_data, cost_assumptions=None, bbox=None, mask=None, transform=None, geometry_buffer_m=0, ignore_value=65535, multiply=False, zone_field=None, forbidden_zone=None, forbidden_value=65535, **kwargs)[source]

Modify the raster with an additional dataset.

Parameters:
Return type:

ndarray

Returns:

The modified raster

modify_raster_with_geodataframe(gdf, value, ignore_value=65535, multiply=False)[source]

Modifies the raster cells inside the polygons of a GeoDataFrame.

Parameters:
  • gdf (GeoDataFrame) – The GeoDataFrame containing polygons to use for masking

  • value (float) – The value to set for the raster cells inside the polygons

  • ignore_value (Optional[float]) – Value in the raster to ignore during modification

  • multiply (bool) – If True, multiply the raster values by the given value

Return type:

ndarray

Returns:

The modified raster

rasterize(field_name='cost', resolution_in_m=1.0, fill_value=65535, save_path=None, dtype='uint16', geometry_buffer_m=0, bounding_box=None, fancy_function=None, fancy_function_kwargs=None)[source]

Rasterize the base dataset based on a specified field.

Parameters:
  • field_name (str) – The field to use for rasterization values

  • resolution_in_m (float) – The resolution of the output raster in meters

  • fill_value (int) – Value to use for areas with no data

  • save_path (Optional[str]) – Path to save the rasterized output

  • dtype (str) – Data type for the output raster

  • geometry_buffer_m (float) – Buffer to apply to the dataset geometries

  • bounding_box (Optional[Polygon]) – Bounding box to define the rasterization extent

  • fancy_function (Optional[Callable]) – A function that takes the base dataset as a first

  • will (argument and other arguments defined in fancy_function_kwargs which)

  • rasterization (be called before)

  • fancy_function_kwargs (Optional[dict[str, Any]]) – The keyword arguments passed to fancy_function

Return type:

RasterDataset

Returns:

tuple of (raster_data, transform)

save_raster(save_path)[source]

Save the rasterized data to a file.

Parameters:

save_path (str) – Path to save the raster file

Return type:

None

shrink_raster(exclude_value)[source]

Shrink the raster by removing outer bounds with a specific value.

Parameters:

exclude_value (int) – Value to exclude from the outer bounds

Return type:

ndarray

Returns:

The shrunk raster

class pyorps.raster.rasterizer.InMemoryRasterDataset(file_source, crs, transform)[source]

Bases: RasterDataset

Parameters:
__init__(file_source, crs, transform)[source]
Parameters:
count: int
crs: str = None
data: Union[GeoDataFrame, ndarray, None] = None
dtype: dtype
file_source: Any
load_data(**kwargs)[source]
shape: tuple[int, int]
transform: Affine
class pyorps.raster.rasterizer.Polygon(shell=None, holes=None)[source]

Bases: BaseGeometry

A geometry type representing an area that is enclosed by a linear ring.

A polygon is a two-dimensional feature and has a non-zero area. It may have one or more negative-space “holes” which are also bounded by linear rings. If any rings cross each other, the feature is invalid and operations on it may fail.

Parameters:
  • shell (sequence) – A sequence of (x, y [,z]) numeric coordinate pairs or triples, or an array-like with shape (N, 2) or (N, 3). Also can be a sequence of Point objects.

  • holes (sequence) – A sequence of objects which satisfy the same requirements as the shell parameters above

exterior

The ring which bounds the positive space of the polygon.

Type:

LinearRing

interiors

A sequence of rings which bound all existing holes.

Type:

sequence

Examples

Create a square polygon with no holes

>>> from shapely import Polygon
>>> coords = ((0., 0.), (0., 1.), (1., 1.), (1., 0.), (0., 0.))
>>> polygon = Polygon(coords)
>>> polygon.area
1.0
__and__(other)

Return the intersection of the geometries.

__bool__()

Return True if the geometry is not empty, else False.

__format__(format_spec)

Format a geometry using a format specification.

static __new__(self, shell=None, holes=None)[source]

Create a new Polygon geometry.

__nonzero__()

Return True if the geometry is not empty, else False.

__or__(other)

Return the union of the geometries.

__reduce__()

Pickle support.

__repr__()

Return a string representation of the geometry.

__str__()

Return a string representation of the geometry.

__sub__(other)

Return the difference of the geometries.

__xor__(other)

Return the symmetric difference of the geometries.

property area

Unitless area of the geometry (float).

property boundary

Return a lower dimension geometry that bounds the object.

The boundary of a polygon is a line, the boundary of a line is a collection of points. The boundary of a point is an empty (null) collection.

property bounds

Return minimum bounding region (minx, miny, maxx, maxy).

buffer(distance, quad_segs=16, cap_style='round', join_style='round', mitre_limit=5.0, single_sided=False, **kwargs)

Get a geometry that represents all points within a distance of this geometry.

A positive distance produces a dilation, a negative distance an erosion. A very small or zero distance may sometimes be used to “tidy” a polygon.

Parameters:
  • distance (float) – The distance to buffer around the object.

  • quad_segs (int, optional) – Sets the number of line segments used to approximate an angle fillet.

  • cap_style (shapely.BufferCapStyle or {'round', 'square', 'flat'}, default 'round') – Specifies the shape of buffered line endings. BufferCapStyle.round (‘round’) results in circular line endings (see quad_segs). Both BufferCapStyle.square (‘square’) and BufferCapStyle.flat (‘flat’) result in rectangular line endings, only BufferCapStyle.flat (‘flat’) will end at the original vertex, while BufferCapStyle.square (‘square’) involves adding the buffer width.

  • join_style (shapely.BufferJoinStyle or {'round', 'mitre', 'bevel'}, default 'round') – Specifies the shape of buffered line midpoints. BufferJoinStyle.ROUND (‘round’) results in rounded shapes. BufferJoinStyle.bevel (‘bevel’) results in a beveled edge that touches the original vertex. BufferJoinStyle.mitre (‘mitre’) results in a single vertex that is beveled depending on the mitre_limit parameter.

  • mitre_limit (float, optional) – The mitre limit ratio is used for very sharp corners. The mitre ratio is the ratio of the distance from the corner to the end of the mitred offset corner. When two line segments meet at a sharp angle, a miter join will extend the original geometry. To prevent unreasonable geometry, the mitre limit allows controlling the maximum length of the join corner. Corners with a ratio which exceed the limit will be beveled.

  • single_sided (bool, optional) –

    The side used is determined by the sign of the buffer distance:

    a positive distance indicates the left-hand side a negative distance indicates the right-hand side

    The single-sided buffer of point geometries is the same as the regular buffer. The End Cap Style for single-sided buffers is always ignored, and forced to the equivalent of CAP_FLAT.

  • quadsegs (int, optional) – Deprecated aliases for quad_segs.

  • resolution (int, optional) – Deprecated aliases for quad_segs.

  • **kwargs (dict, optional) – For backwards compatibility of renamed parameters. If an unsupported kwarg is passed, a ValueError will be raised.

Return type:

Geometry

Notes

The return value is a strictly two-dimensional geometry. All Z coordinates of the original geometry will be ignored.

Deprecated since version 2.1.0: A deprecation warning is shown if quad_segs, cap_style, join_style, mitre_limit or single_sided are specified as positional arguments. In a future release, these will need to be specified as keyword arguments.

Examples

>>> from shapely import BufferCapStyle
>>> from shapely.wkt import loads
>>> g = loads('POINT (0.0 0.0)')

16-gon approx of a unit radius circle:

>>> g.buffer(1.0).area
3.1365484905459398

128-gon approximation:

>>> g.buffer(1.0, 128).area
3.1415138011443013

triangle approximation:

>>> g.buffer(1.0, 3).area
3.0
>>> list(g.buffer(1.0, cap_style=BufferCapStyle.square).exterior.coords)
[(1.0, 1.0), (1.0, -1.0), (-1.0, -1.0), (-1.0, 1.0), (1.0, 1.0)]
>>> g.buffer(1.0, cap_style=BufferCapStyle.square).area
4.0
property centroid

Return the geometric center of the object.

contains(other)

Return True if the geometry contains the other, else False.

contains_properly(other)

Return True if the geometry completely contains the other.

There should be no common boundary points.

Refer to shapely.contains_properly for full documentation.

property convex_hull

Return the convex hull of the geometry.

Imagine an elastic band stretched around the geometry: that’s a convex hull, more or less.

The convex hull of a three member multipoint, for example, is a triangular polygon.

property coords

Not implemented for polygons.

covered_by(other)

Return True if the geometry is covered by the other, else False.

covers(other)

Return True if the geometry covers the other, else False.

crosses(other)

Return True if the geometries cross, else False.

difference(other, grid_size=None)

Return the difference of the geometries.

Refer to shapely.difference for full documentation.

disjoint(other)

Return True if geometries are disjoint, else False.

distance(other)

Unitless distance to other geometry (float).

dwithin(other, distance)

Return True if geometry is within a given distance from the other.

Refer to shapely.dwithin for full documentation.

property envelope

A figure that envelopes the geometry.

equals(other)

Return True if geometries are equal, else False.

This method considers point-set equality (or topological equality), and is equivalent to (self.within(other) & self.contains(other)).

Examples

>>> from shapely import LineString
>>> LineString(
...     [(0, 0), (2, 2)]
... ).equals(
...     LineString([(0, 0), (1, 1), (2, 2)])
... )
True
Return type:

bool

equals_exact(other, tolerance=0.0, *, normalize=False)

Return True if the geometries are equivalent within the tolerance.

Refer to equals_exact() for full documentation.

Parameters:
  • other (BaseGeometry) – The other geometry object in this comparison.

  • tolerance (float, optional (default: 0.)) – Absolute tolerance in the same units as coordinates.

  • normalize (bool, optional (default: False)) –

    If True, normalize the two geometries so that the coordinates are in the same order.

    Added in version 2.1.0.

Examples

>>> from shapely import LineString
>>> LineString(
...     [(0, 0), (2, 2)]
... ).equals_exact(
...     LineString([(0, 0), (1, 1), (2, 2)]),
...     1e-6
... )
False
Return type:

bool

property exterior

Return the exterior ring of the polygon.

classmethod from_bounds(xmin, ymin, xmax, ymax)[source]

Construct a Polygon() from spatial bounds.

property geom_type

Name of the geometry’s type, such as ‘Point’.

geometryType()

Get the geometry type (deprecated).

Deprecated since version 2.0: Use the geom_type attribute instead.

property has_m

True if the geometry’s coordinate sequence(s) have m values.

property has_z

True if the geometry’s coordinate sequence(s) have z values.

hausdorff_distance(other)

Unitless hausdorff distance to other geometry (float).

property interiors

Return the sequence of interior rings of the polygon.

interpolate(distance, normalized=False)

Return a point at the specified distance along a linear geometry.

Negative length values are taken as measured in the reverse direction from the end of the geometry. Out-of-range index values are handled by clamping them to the valid range of values. If the normalized arg is True, the distance will be interpreted as a fraction of the geometry’s length.

Alias of line_interpolate_point.

intersection(other, grid_size=None)

Return the intersection of the geometries.

Refer to shapely.intersection for full documentation.

intersects(other)

Return True if geometries intersect, else False.

property is_closed

True if the geometry is closed, else False.

Applicable only to linear geometries.

property is_empty

True if the set of points in this geometry is empty, else False.

property is_ring

True if the geometry is a closed ring, else False.

property is_simple

True if the geometry is simple.

Simple means that any self-intersections are only at boundary points.

property is_valid

True if the geometry is valid.

The definition depends on sub-class.

property length

Unitless length of the geometry (float).

line_interpolate_point(distance, normalized=False)

Return a point at the specified distance along a linear geometry.

Negative length values are taken as measured in the reverse direction from the end of the geometry. Out-of-range index values are handled by clamping them to the valid range of values. If the normalized arg is True, the distance will be interpreted as a fraction of the geometry’s length.

Alias of interpolate.

line_locate_point(other, normalized=False)

Return the distance of this geometry to a point nearest the specified point.

If the normalized arg is True, return the distance normalized to the length of the linear geometry.

Alias of project.

property minimum_clearance

Unitless distance a node can be moved to produce an invalid geometry (float).

property minimum_rotated_rectangle

Return the oriented envelope (minimum rotated rectangle) of the geometry.

The oriented envelope encloses an input geometry, such that the resulting rectangle has minimum area.

Unlike envelope this rectangle is not constrained to be parallel to the coordinate axes. If the convex hull of the object is a degenerate (line or point) this degenerate is returned.

The starting point of the rectangle is not fixed. You can use normalize() to reorganize the rectangle to strict canonical form so the starting point is always the lower left point.

Alias of oriented_envelope.

normalize()

Convert geometry to normal form (or canonical form).

This method orders the coordinates, rings of a polygon and parts of multi geometries consistently. Typically useful for testing purposes (for example in combination with equals_exact).

Examples

>>> from shapely import MultiLineString
>>> line = MultiLineString([[(0, 0), (1, 1)], [(3, 3), (2, 2)]])
>>> line.normalize()
<MULTILINESTRING ((2 2, 3 3), (0 0, 1 1))>
property oriented_envelope

Return the oriented envelope (minimum rotated rectangle) of a geometry.

The oriented envelope encloses an input geometry, such that the resulting rectangle has minimum area.

Unlike envelope this rectangle is not constrained to be parallel to the coordinate axes. If the convex hull of the object is a degenerate (line or point) this degenerate is returned.

The starting point of the rectangle is not fixed. You can use normalize() to reorganize the rectangle to strict canonical form so the starting point is always the lower left point.

Alias of minimum_rotated_rectangle.

overlaps(other)

Return True if geometries overlap, else False.

point_on_surface()

Return a point guaranteed to be within the object, cheaply.

Alias of representative_point.

project(other, normalized=False)

Return the distance of geometry to a point nearest the specified point.

If the normalized arg is True, return the distance normalized to the length of the linear geometry.

Alias of line_locate_point.

relate(other)

Return the DE-9IM intersection matrix for the two geometries (string).

relate_pattern(other, pattern)

Return True if the DE-9IM relationship code satisfies the pattern.

representative_point()

Return a point guaranteed to be within the object, cheaply.

Alias of point_on_surface.

reverse()

Return a copy of this geometry with the order of coordinates reversed.

If the geometry is a polygon with interior rings, the interior rings are also reversed.

Points are unchanged.

See also

is_ccw

Checks if a geometry is clockwise.

Examples

>>> from shapely import LineString, Polygon
>>> LineString([(0, 0), (1, 2)]).reverse()
<LINESTRING (1 2, 0 0)>
>>> Polygon([(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)]).reverse()
<POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))>
segmentize(max_segment_length)

Add vertices to line segments based on maximum segment length.

Additional vertices will be added to every line segment in an input geometry so that segments are no longer than the provided maximum segment length. New vertices will evenly subdivide each segment.

Only linear components of input geometries are densified; other geometries are returned unmodified.

Parameters:

max_segment_length (float or array_like) – Additional vertices will be added so that all line segments are no longer this value. Must be greater than 0.

Examples

>>> from shapely import LineString, Polygon
>>> LineString([(0, 0), (0, 10)]).segmentize(max_segment_length=5)
<LINESTRING (0 0, 0 5, 0 10)>
>>> Polygon([(0, 0), (10, 0), (10, 10), (0, 10), (0, 0)]).segmentize(max_segment_length=5)
<POLYGON ((0 0, 5 0, 10 0, 10 5, 10 10, 5 10, 0 10, 0 5, 0 0))>
simplify(tolerance, preserve_topology=True)

Return a simplified geometry produced by the Douglas-Peucker algorithm.

Coordinates of the simplified geometry will be no more than the tolerance distance from the original. Unless the topology preserving option is used, the algorithm may produce self-intersecting or otherwise invalid geometries.

svg(scale_factor=1.0, fill_color=None, opacity=None)[source]

Return SVG path element for the Polygon geometry.

Parameters:
  • scale_factor (float) – Multiplication factor for the SVG stroke-width. Default is 1.

  • fill_color (str, optional) – Hex string for fill color. Default is to use “#66cc99” if geometry is valid, and “#ff3333” if invalid.

  • opacity (float) – Float number between 0 and 1 for color opacity. Default value is 0.6

symmetric_difference(other, grid_size=None)

Return the symmetric difference of the geometries.

Refer to shapely.symmetric_difference for full documentation.

touches(other)

Return True if geometries touch, else False.

property type

Get the geometry type (deprecated).

Deprecated since version 2.0: Use the geom_type attribute instead.

union(other, grid_size=None)

Return the union of the geometries.

Refer to shapely.union for full documentation.

within(other)

Return True if geometry is within the other, else False.

property wkb

WKB representation of the geometry.

property wkb_hex

WKB hex representation of the geometry.

property wkt

WKT representation of the geometry.

property xy

Separate arrays of X and Y coordinate values.

class pyorps.raster.rasterizer.RasterDataset(file_source, crs=None)[source]

Bases: GeoDataset, ABC

Parameters:
__init__(file_source, crs=None)
Parameters:
  • file_source (Any)

  • crs (str | None)

count: int
crs: str = None
data: Union[GeoDataFrame, ndarray, None] = None
dtype: dtype
file_source: Any
abstractmethod load_data(**kwargs)
shape: tuple[int, int]
transform: Affine
class pyorps.raster.rasterizer.VectorDataset(file_source, crs=None, bbox=None, mask=None)[source]

Bases: GeoDataset, ABC

Parameters:
__init__(file_source, crs=None, bbox=None, mask=None)[source]
Parameters:
abstractmethod apply_bbox()[source]
abstractmethod apply_mask()[source]
bbox: Union[Polygon, GeoDataFrame, GeoSeries, tuple[float, float, float, float], None] = (None,)
abstractmethod correct_crs()[source]
crs: Optional[str] = None
data: Union[GeoDataFrame, ndarray, None] = None
file_source: Any
abstractmethod load_data(**kwargs)
mask: Union[Polygon, GeoDataFrame, tuple, None] = (None,)
abstractmethod post_loading()[source]
pyorps.raster.rasterizer.box(minx, miny, maxx, maxy, ccw=True)[source]

Return a rectangular polygon with configurable normal vector.

pyorps.raster.rasterizer.deepcopy(x, memo=None, _nil=[])[source]

Deep copy operation on arbitrary Python objects.

See the module’s __doc__ string for more info.

pyorps.raster.rasterizer.from_bounds(west, south, east, north, width, height)[source]

Return an Affine transformation given bounds, width and height.

Return an Affine transformation for a georeferenced raster given its bounds west, south, east, north and its width and height in number of pixels.

pyorps.raster.rasterizer.geometry_mask(geometries, out_shape, transform, all_touched=False, invert=False)[source]

Create a mask from shapes.

By default, mask is intended for use as a numpy mask, where pixels that overlap shapes are False.

Parameters:
  • geometries (iterable over geometries (GeoJSON-like objects))

  • out_shape (tuple or list) – Shape of output numpy.ndarray.

  • transform (Affine transformation object) – Transformation from pixel coordinates of source to the coordinate system of the input shapes. See the transform property of dataset objects.

  • all_touched (boolean, optional) – If True, all pixels touched by geometries will be burned in. If False, only pixels whose center is within the polygon or that are selected by Bresenham’s line algorithm will be burned in. False by default

  • invert (boolean, optional) – If True, mask will be True for pixels that overlap shapes. False by default.

Returns:

Type is numpy.bool_

Return type:

numpy.ndarray

Notes

See rasterize() for performance notes.

pyorps.raster.rasterizer.initialize_geo_dataset(file_source, crs=None, bbox=None, mask=None, transform=None)[source]

Factory function to create the appropriate GeoDataset instance based on the provided input.

Parameters:
Return type:

GeoDataset

Returns:

An appropriate GeoDataset subclass instance

Examples

# From local vector file vector_dataset = create_geo_dataset(“path/to/shapefile.shp”, crs=”EPSG:4326”)

# From GeoDataFrame vector_dataset = create_geo_dataset(gdf, bbox=(x1, y1, x2, y2))

# From WFS source wfs_dataset = create_geo_dataset({“url”: “https://example.com/wfs”,

“layer”: “layer1”})

# From local raster file raster_dataset = create_geo_dataset(“path/to/dem.tif”)

# From numpy array raster_dataset = create_geo_dataset(array_data, transform=transform,

crs=”EPSG:4326”)

pyorps.raster.rasterizer.rasterize(shapes, out_shape=None, fill=0, nodata=None, masked=False, out=None, transform=(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0), all_touched=False, merge_alg=MergeAlg.replace, default_value=1, dtype=None, skip_invalid=True, dst_path=None, dst_kwds=None)[source]

Return an image array with input geometries burned in.

Warnings will be raised for any invalid or empty geometries, and an exception will be raised if there are no valid shapes to rasterize.

Parameters:
  • shapes (iterable of (`geometry`, value) pairs or geometries) – The geometry can either be an object that implements the geo interface or GeoJSON-like object. If no value is provided the default_value will be used. If value is None the fill value will be used.

  • out_shape (tuple or list with 2 integers) – Shape of output numpy.ndarray.

  • fill (int or float, optional) – Used as fill value for all areas not covered by input geometries.

  • nodata (float, optional) – nodata value to use in output file or masked array.

  • masked (bool, optional. Default: False.) – If True, return a masked array. Note: nodata is always set in the case of file output.

  • out (numpy.ndarray, optional) – Array in which to store results. If not provided, out_shape and dtype are required.

  • transform (Affine transformation object, optional) – Transformation from pixel coordinates of source to the coordinate system of the input shapes. See the transform property of dataset objects.

  • all_touched (boolean, optional) – If True, all pixels touched by geometries will be burned in. If false, only pixels whose center is within the polygon or that are selected by Bresenham’s line algorithm will be burned in.

  • merge_alg (MergeAlg, optional) –

    Merge algorithm to use. One of:
    MergeAlg.replace (default):

    the new value will overwrite the existing value.

    MergeAlg.add:

    the new value will be added to the existing raster.

  • default_value (int or float, optional) – Used as value for all geometries, if not provided in shapes.

  • dtype (rasterio or numpy.dtype, optional) – Used as data type for results, if out is not provided.

  • skip_invalid (bool, optional) – If True (default), invalid shapes will be skipped. If False, ValueError will be raised.

  • dst_path (str or PathLike, optional) – Path of output dataset

  • dst_kwds (dict, optional) – Dictionary of creation options and other parameters that will be overlaid on the profile of the output dataset.

Returns:

If out was not None then out is returned, it will have been modified in-place. If out was None, this will be a new array.

Return type:

numpy.ndarray

Notes

Valid data types for fill, default_value, out, dtype and shape values are “int16”, “int32”, “uint8”, “uint16”, “uint32”, “float32”, and “float64”.

This function requires significant memory resources. The shapes iterator will be materialized to a Python list and another C copy of that list will be made. The out array will be copied and additional temporary raster memory equal to 2x the smaller of out data or GDAL’s max cache size (controlled by GDAL_CACHEMAX, default is 5% of the computer’s physical memory) is required.

If GDAL max cache size is smaller than the output data, the array of shapes will be iterated multiple times. Performance is thus a linear function of buffer size. For maximum speed, ensure that GDAL_CACHEMAX is larger than the size of out or out_shape.

pyorps.raster.rasterizer.rio_open(fp, mode='r', driver=None, width=None, height=None, count=None, crs=None, transform=None, dtype=None, nodata=None, sharing=False, opener=None, **kwargs)

Open a dataset for reading or writing.

The dataset may be located in a local file, in a resource located by a URL, or contained within a stream of bytes. This function accepts different types of fp parameters. However, it is almost always best to pass a string that has a dataset name as its value. These are passed directly to GDAL protocol and format handlers. A path to a zipfile is more efficiently used by GDAL than a Python ZipFile object, for example.

In read (‘r’) or read/write (‘r+’) mode, no keyword arguments are required: these attributes are supplied by the opened dataset.

In write (‘w’ or ‘w+’) mode, the driver, width, height, count, and dtype keywords are strictly required.

Parameters:
  • fp (str, os.PathLike, file-like, or rasterio.io.MemoryFile) – A filename or URL, a file object opened in binary (‘rb’) mode, a Path object, or one of the rasterio classes that provides the dataset-opening interface (has an open method that returns a dataset). Use a string when possible: GDAL can more efficiently access a dataset if it opens it natively.

  • mode (str, optional) – ‘r’ (read, the default), ‘r+’ (read/write), ‘w’ (write), or ‘w+’ (write/read).

  • driver (str, optional) – A short format driver name (e.g. “GTiff” or “JPEG”) or a list of such names (see GDAL docs at https://gdal.org/drivers/raster/index.html). In ‘w’ or ‘w+’ modes a single name is required. In ‘r’ or ‘r+’ modes the driver can usually be omitted. Registered drivers will be tried sequentially until a match is found. When multiple drivers are available for a format such as JPEG2000, one of them can be selected by using this keyword argument.

  • width (int, optional) – The number of columns of the raster dataset. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • height (int, optional) – The number of rows of the raster dataset. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • count (int, optional) – The count of dataset bands. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • crs (str, dict, or CRS, optional) – The coordinate reference system. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • transform (affine.Affine, optional) – Affine transformation mapping the pixel space to geographic space. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • dtype (str or numpy.dtype, optional) – The data type for bands. For example: ‘uint8’ or rasterio.uint16. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • nodata (int, float, or nan, optional) – Defines the pixel value to be interpreted as not valid data. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.

  • sharing (bool, optional) – To reduce overhead and prevent programs from running out of file descriptors, rasterio maintains a pool of shared low level dataset handles. If True this function will use a shared handle if one is available. Multithreaded programs must avoid sharing and should set sharing to False.

  • opener (callable, optional) – A custom dataset opener which can serve GDAL’s virtual filesystem machinery via Python file-like objects. The underlying file-like object is obtained by calling opener with (fp, mode) or (fp, mode + “b”) depending on the format driver’s native mode. opener must return a Python file-like object that provides read, seek, tell, and close methods. Note: only one opener at a time per fp, mode pair is allowed.

  • kwargs (optional) – These are passed to format drivers as directives for creating or interpreting datasets. For example: in ‘w’ or ‘w+’ modes a tiled=True keyword argument will direct the GeoTIFF format driver to create a tiled, rather than striped, TIFF.

Returns:

  • rasterio.io.DatasetReader – If mode is “r”.

  • rasterio.io.DatasetWriter – If mode is “r+”, “w”, or “w+”.

Raises:
  • TypeError – If arguments are of the wrong Python type.

  • rasterio.errors.RasterioIOError – If the dataset can not be opened. Such as when there is no dataset with the given name.

  • rasterio.errors.DriverCapabilityError – If the detected format driver does not support the requested opening mode.

Examples

To open a local GeoTIFF dataset for reading using standard driver discovery and no directives:

>>> import rasterio
>>> with rasterio.open('example.tif') as dataset:
...     print(dataset.profile)

To open a local JPEG2000 dataset using only the JP2OpenJPEG driver:

>>> with rasterio.open(
...         'example.jp2', driver='JP2OpenJPEG') as dataset:
...     print(dataset.profile)

To create a new 8-band, 16-bit unsigned, tiled, and LZW-compressed GeoTIFF with a global extent and 0.5 degree resolution:

>>> from rasterio.transform import from_origin
>>> with rasterio.open(
...         'example.tif', 'w', driver='GTiff', dtype='uint16',
...         width=720, height=360, count=8, crs='EPSG:4326',
...         transform=from_origin(-180.0, 90.0, 0.5, 0.5),
...         nodata=0, tiled=True, compress='lzw') as dataset:
...     dataset.write(...)

Module contents

Raster data processing functionality for geospatial analysis.

This module provides: 1. Classes for handling and manipulating raster datasets 2. Rasterization tools for converting vector data to rasters 3. Cost surface generation capabilities 4. Utility functions for creating test data and processing rasters

class pyorps.raster.GeoRasterizer(input_data, cost_assumptions, bbox=None, mask=None, default_crs=None, **kwargs)[source]

Bases: object

A class for preparing and rasterizing geospatial data with cost assumptions.

This class integrates:
  • GeoDataset for representing datasets with metadata

  • CostAssumptions for handling cost mappings

  • Rasterization functionality for converting vector data to rasters

Parameters:
__init__(input_data, cost_assumptions, bbox=None, mask=None, default_crs=None, **kwargs)[source]

Initialize the GeoRasterizer with a base dataset and optional parameters.

Parameters:
property base_data: GeoDataFrame

Property to directly access the data attribute of the base_dataset.

Returns:

The base dataset (GeoDataFrame

clip_to_area(clip_geometry)[source]

Clip the base dataset to a specific area.

Parameters:

clip_geometry (Union[GeoDataFrame, Polygon]) – The geometry to clip by

Return type:

GeoDataset

Returns:

The clipped base dataset

create_bounds_geodataframe(target_crs=None)[source]

Creates a GeoDataFrame from the bounds of the base data in a specified CRS.

Parameters:

target_crs (Optional[str]) – The desired CRS for the new GeoDataFrame

Return type:

GeoDataFrame

Returns:

A new GeoDataFrame containing the bounds of the base data

static create_buffer(dataset, geometry_buffer_m, inplace=True)[source]

Add a buffer to geometries in a dataset.

Parameters:
  • dataset (Union[VectorDataset, GeoDataFrame]) – The dataset to buffer (GeoDataset or GeoDataFrame)

  • geometry_buffer_m (float) – Distance to buffer in dataset’s CRS units

  • inplace (bool) – If True, modify the dataset in place

Return type:

Union[VectorDataset, GeoDataFrame]

Returns:

The buffered dataset

property crs

Passing crs property of base_dataset.

Returns:

The desired CRS of the base dataset

modify_raster_from_dataset(input_data, cost_assumptions=None, bbox=None, mask=None, transform=None, geometry_buffer_m=0, ignore_value=65535, multiply=False, zone_field=None, forbidden_zone=None, forbidden_value=65535, **kwargs)[source]

Modify the raster with an additional dataset.

Parameters:
Return type:

ndarray

Returns:

The modified raster

modify_raster_with_geodataframe(gdf, value, ignore_value=65535, multiply=False)[source]

Modifies the raster cells inside the polygons of a GeoDataFrame.

Parameters:
  • gdf (GeoDataFrame) – The GeoDataFrame containing polygons to use for masking

  • value (float) – The value to set for the raster cells inside the polygons

  • ignore_value (Optional[float]) – Value in the raster to ignore during modification

  • multiply (bool) – If True, multiply the raster values by the given value

Return type:

ndarray

Returns:

The modified raster

rasterize(field_name='cost', resolution_in_m=1.0, fill_value=65535, save_path=None, dtype='uint16', geometry_buffer_m=0, bounding_box=None, fancy_function=None, fancy_function_kwargs=None)[source]

Rasterize the base dataset based on a specified field.

Parameters:
  • field_name (str) – The field to use for rasterization values

  • resolution_in_m (float) – The resolution of the output raster in meters

  • fill_value (int) – Value to use for areas with no data

  • save_path (Optional[str]) – Path to save the rasterized output

  • dtype (str) – Data type for the output raster

  • geometry_buffer_m (float) – Buffer to apply to the dataset geometries

  • bounding_box (Optional[Polygon]) – Bounding box to define the rasterization extent

  • fancy_function (Optional[Callable]) – A function that takes the base dataset as a first

  • will (argument and other arguments defined in fancy_function_kwargs which)

  • rasterization (be called before)

  • fancy_function_kwargs (Optional[dict[str, Any]]) – The keyword arguments passed to fancy_function

Return type:

RasterDataset

Returns:

tuple of (raster_data, transform)

save_raster(save_path)[source]

Save the rasterized data to a file.

Parameters:

save_path (str) – Path to save the raster file

Return type:

None

shrink_raster(exclude_value)[source]

Shrink the raster by removing outer bounds with a specific value.

Parameters:

exclude_value (int) – Value to exclude from the outer bounds

Return type:

ndarray

Returns:

The shrunk raster

class pyorps.raster.RasterHandler(raster_source, source_coords, target_coords, search_space_buffer_m=None, input_crs=None, apply_mask=True, outside_value=None, bands=None)[source]

Bases: object

Class for efficiently working with raster data while preserving geographic transformation information. Can be initialized with either a file path or directly with raster data, CRS, and transform.

Parameters:
__init__(raster_source, source_coords, target_coords, search_space_buffer_m=None, input_crs=None, apply_mask=True, outside_value=None, bands=None)[source]

Initialize a RasterHandler for working with raster data and coordinate transformations.

Creates a window and buffer geometry based on source and target coordinates: - If source and target are single coordinates: creates a line buffer - If source and/or target are lists of coordinates: creates a polygon buffer

Parameters:
  • raster_source (RasterDataset) – Either: - Path to the raster file (str), or - Tuple of (data_array, crs, transform)

  • source_coords (Union[Tuple[float, float], List[Tuple[float, float]]]) – Source point(s) as (x, y) tuple or list of tuples

  • target_coords (Union[Tuple[float, float], List[Tuple[float, float]]]) – Target point(s) as (x, y) tuple or list of tuples

  • search_space_buffer_m (Optional[float]) – Buffer distance in map units (typically meters)

  • input_crs (Optional[str]) – CRS of the input coordinates (e.g., ‘EPSG:4326’). If None, assumes same as raster

  • apply_mask (bool) – If True, apply the buffer mask after loading data

  • outside_value (Optional[Any]) – Value to set for pixels outside the buffer (defaults to max value of the data type)

  • bands (Optional[List[int]]) – List of bands to modify if apply_mask is True (1-based). If None, all bands are modified

apply_geometry_mask(geometry, outside_value=None, bands=None)[source]

Set pixel values outside the given geometry to the specified value.

Parameters:
  • geometry (Polygon) – A shapely geometry object (Polygon)

  • outside_value (Optional[int]) – Value to set for pixels outside the geometry

  • bands (Union[list[int], int, None]) – List of bands to modify (1-based). If None, all bands are modified.

buffer_geometry: Polygon
coords_to_indices(coords)[source]

Convert geographic coordinates to pixel row/column indices within this raster section.

Parameters:

coords (Union[tuple[float, float], list[float], list[Union[tuple[float, float], list[float]]]]) – List of (x, y) coordinate tuples or a single coordinate tuple

Returns:

Array of (row, col) pixel indices

Return type:

numpy.ndarray

data: ndarray
estimate_buffer_width(source_coords, target_coords, min_buffer=200, max_buffer=4000, sample_radius=50)[source]

Estimate an appropriate buffer width for path finding based on terrain characteristics.

Parameters:
Returns:

Estimated optimal buffer width in meters

indices_to_coords(indices)[source]

Convert pixel indices to geographic coordinates.

Parameters:

indices (List[Tuple[int, int]]) – List of (row, col) pixel indices

Returns:

Array of (x, y) coordinates

Return type:

numpy.ndarray

static max_distance_pair(coords1, coords2)[source]

Find the pair of coordinates (one from coords1, one from coords2) with the highest Euclidean distance.

Parameters:
Returns:

A tuple containing the two points with the maximum distance (point1, point2)

raster_dataset: RasterDataset
save_section_as_raster(output_path)[source]

Save the section as a new raster file with proper geo referencing.

Parameters:

output_path (str) – Path for the output raster file

search_space_buffer_m: float
window: Window
window_transform: Affine