The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Type: Package
Title: R to Solr Interface
Version: 0.0.13
Author: Michael Lawrence, Gabe Becker, Jan Vogel
Maintainer: Michael Lawrence <michafla@gene.com>
Description: A comprehensive R API for querying Apache Solr databases. A Solr core is represented as a data frame or list that supports Solr-side filtering, sorting, transformation and aggregation, all through the familiar base R API. Queries are processed lazily, i.e., a query is only sent to the database when the data are required.
License: Apache License (== 2.0)
VignetteBuilder: knitr
Imports: restfulr (≥ 0.0.2), graph, S4Vectors (≥ 0.14.3), rjson, XML, RCurl
Depends: R (≥ 3.4.0), BiocGenerics (≥ 0.15.1), methods
Suggests: nycflights13, RUnit, MASS, knitr
Collate: utils.R pminmax.R Context-class.R DocCollection-class.R Expression-class.R Facets-class.R FieldInfo-class.R FieldType-class.R Promise-class.R SolrExpression-class.R SolrQuery-class.R SolrSchema-class.R SolrCore-class.R SolrResult-class.R SolrSummary-class.R Solr-class.R SolrList-class.R SolrFrame-class.R SolrPromise-class.R GroupedSolrFrame-class.R test.R zzz.R
NeedsCompilation: no
Packaged: 2022-05-17 23:32:52 UTC; michafla
Repository: CRAN
Date/Publication: 2022-05-18 07:10:02 UTC

Evaluation Contexts

Description

The Context class is for representing contexts in which expressions are evaluated. This might be an R environment, a database, or some other external system.

Translation

Contexts play an important role in translation. When extracting an object by name, the context can delegate to a SymbolFactory to create a Symbol object that is a lazy reference to the object. The reference is expressed in the target language. If there is no SymbolFactory, i.e., it has been set to NULL, then evaluation is eager.

The intent is to decouple the type of the context from a particular language, since a context could support the evaluation of multiple languages. The accessors below effectively allow one to specify the desired target language.

Author(s)

Michael Lawrence


DocCollection

Description

DocCollection is a virtual class for all representations of document collections. It is made concrete by DocList and DocDataFrame. This is mostly to achieve an abstraction around tabular and list representations of documents.

Accessors

These are the accessors that should apply equivalently to any derivative of DocCollection, which provides reasonable default implementations for most of them.

Author(s)

Michael Lawrence

See Also

DocList and DocDataFrame for concrete implementations


DocDataFrame

Description

The DocDataFrame object wraps a data.frame in a document-oriented interface that is shared with DocList. This is mostly to achieve an abstraction around tabular and list representations of documents. DocDataFrame should behave just like a data.frame, except it adds the accessors described below.

Accessors

These are some accessors that DocDataFrame adds on top of the basic data frame accessors. Using these accessors allows code to be agnostic to whether the data are stored as a list or data.frame.

Author(s)

Michael Lawrence

See Also

DocList for representing a document collection as a list instead of a table


DocList

Description

The DocList object wraps a list in a document-oriented interface that is shared with DocDataFrame. This is mostly to achieve an abstraction around tabular and list representations of documents. DocList should behave just like a list, except it adds the accessors described below.

Accessors

These are some accessors that DocList adds on top of the basic list accessors. Using these accessors allows code to be agnostic to whether the data are stored as a list or data.frame.

Author(s)

Michael Lawrence

See Also

DocDataFrame for representing a document collection as a table instead of a list


Expressions and Translation

Description

Underlying rsolr is a simple, general framework for representing, manipulating and translating between expressions in arbitrary languages. The two foundational classes are Expression and Symbol, which are partially implemented by SimpleExpression and SimpleSymbol, respectively.

Translation

The Expression framework defines a translation strategy based on evaluating source language expressions, using promises to represent the objects, such that the result is a promise with its deferred computation expressed in the target language.

The primary entry point is the translate generic, which has a default method that abstractly implements this strategy. The first step is to obtain a SymbolFactory instance for the target expression type via a method on the SymbolFactory generic. The SymbolFactory (a simple R function) is set on the Context, which should define (perhaps through inheritance) all symbols referenced in the source expression. The translation happens when the source expression is evaluated in the context. The context calls the factory to construct Symbol objects which are passed, along with the context, to the Promise generic, which wraps them in the appropriate type of promise. Typically, R is the source language, and the eval method evaluates the R expression on the promises. Each method for the specific type of promise will construct a new promise with an expression that encodes the computation, building on the existing expression. When evaluation is finished, we simply extract the expression from the returned promise.

Note on Laziness

In general, translation requires access to the referenced data. There may be certain operations that cannot be deferred, so evaluation is allowed to be eager, in the hope that the result can be embedded directly into the larger expression. Or, at the very least, the translation machinery needs to know whether the data actually exist, and whether the data are typed or have other constraints. Since the data and schema are not always available when translation is requested, such as when building a database query that will be sent to by another module to an as-yet-unspecified endpoint, translation itself must be deferred. The TranslationRequest class provides a foundation for capturing translations and evaluating them later.

Author(s)

Michael Lawrence


Facets

Description

The Facets object represents the result of a Solr facet operation and is typically obtained by calling facets on a SolrCore. Most users should just call aggregate or xtabs instead of directly manipulating Facets objects.

Details

Facets extends list and each node adds a grouping factor to the set defined by its ancestors. In other words, parent-child relationships represent interactions between factors. For example, x$a$b gets the node corresponding to the interaction of a and b.

In a single request to Solr, statistics may be calculated for multiple interactions, and they are stored as a data.frame at the corresponding node in the tree. To retrieve them, call the stats accessor, e.g., stats(x$a$b), or as.table for getting the counts as a table (Solr always computes the counts).

Accessors

Coercion

Author(s)

Michael Lawrence

See Also

aggregate for a simpler interface that computes statistics for only a single interaction


FieldInfo

Description

The FieldInfo object is a vector of field entries from the Solr schema. Typically, one retrieves an instance with fields and shows it on the console to get an overview of the schema. The vector-like nature means that functions like [ and length behave as expected.

Accessors

These functions get the “columns” from the field information “table”:

Utilities

Author(s)

Michael Lawrence

See Also

SolrSchema that holds an instance of this object


FieldType

Description

The FieldType object represents the type of a document field. A list of these objects is formally represented as FieldTypeList object, an instance of which is provided by SolrSchema. Internally, FieldType objects are central to the conversion between R and Solr types. At the user level, they are mostly useful for displaying the schema.

Author(s)

Michael Lawrence

See Also

SolrSchema, which communicates information on field types using these classes


GroupedSolrFrame

Description

The GroupedSolrFrame is a highly experimental extension of SolrFrame that models each column as a list, formed by splitting the original vector by a common set of grouping factors.

Details

A GroupedSolrFrame should more or less behave analogously to a data frame where every column is split by a common grouping. Unlike SolrFrame, columns are always extracted lazily. Typical usage is to construct a GroupedSolrFrame by calling group on a SolrFrame, and then to extract columns (as promises) and aggregate them (by e.g. calling mean).

Functions that group the data, such as group and aggregate, simply add to the existing grouping. To clear the grouping, call ungroup or just coerce to a SolrFrame or SolrList.

Accessors

As GroupedSolrFrame inherits much of its functionality from SolrFrame; here we only outline concerns specific to grouped data.

Extended API

Most of the typical data frame accessors and data manipulation functions will work analogously on GroupedSolrFrame (see Details). Below, we list some of the non-standard methods that might be seen as an extension of the data frame API.

Author(s)

Michael Lawrence


Grouping

Description

The Grouping object represents a collection of documents split by some interaction of factors. It is extremely low-level, and its only use is to be coerced to something else, either a list or data.frame, via as.

Author(s)

Michael Lawrence

See Also

ListSolrResult, which provides this object via its groupings method.


ListSolrResult

Description

The SolrResult object represents the result of a Solr query and usually contains a collection of documents and/or facets. The default implementation, ListSolrResult, directly stores the canonical JSON response from Solr. It is usually obtained by evaluating a SolrQuery on a SolrCore, which most users will never do.

Accessors

Since ListSolrResult inherits from list, one can access the raw JSON fields directly through the ordinary list accessors. One should only directly manipulate the Solr response when extending rsolr/Solr at a deep level. Higher-level accessors are described below.

Author(s)

Michael Lawrence

See Also

docs and facets on SolrCore are more convenient and usually sufficient


Promises

Description

The Promise class formally and abstractly represents the potential result of a deferred computation.

Details

Lazy programming is useful in a number of contexts, including interaction with external/remote systems like databases, where we want the computation to occur within the external system, despite appearances to the contrary. Typically, the user constructs one or more promises referring to pre-existing objects. Operations on those objects produce new promises that encode the additional computations. Eventually, usually after some sort of restriction and/or aggregation, the promise is “fulfilled” to yield a materialized, eager object, such as an R vector.

Promise and its partial implementation SimplePromise provide a foundation for implementations that mostly helps with creating and fulfilling promises, while the implementation is responsible for deferring particular computations, which is language-dependent.

Construction

Fulfillment

The basic coercion functions in R, like as.vector and as.data.frame, have methods for Promise that simply call fulfill on the promise, and then perform the coercion. Coercion is preferred to calling fulfill directly.

Author(s)

Michael Lawrence


SolrCore

Description

The SolrCore object represents a core hosted by a Solr instance. A core is essentially a queryable collection of documents that share the same schema. It is usually not necessary to interact with a SolrCore directly.

Details

The typical usage (by advanced users) would be to construct a custom SolrQuery and execute it via the docs, facets or (the very low-level) eval methods.

Accessor methods

In the code snippets below, x is a SolrCore object.

Constructor

Reading

Summarizing

Updating

Evaluation

Coercion

Author(s)

Michael Lawrence

See Also

SolrFrame, the typical way to interact with a Solr core.

Examples


     solr <- TestSolr()
     sc <- SolrCore(solr$uri)
     name(sc)
     ndoc(sc)

     delete(sc)
     
     docs <- list(
        list(id="2", inStock=TRUE, price=2, timestamp_dt=Sys.time()),
        list(id="3", inStock=FALSE, price=3, timestamp_dt=Sys.time()),
        list(id="4", price=4, timestamp_dt=Sys.time()),
        list(id="5", inStock=FALSE, price=5, timestamp_dt=Sys.time())
     )
     update(sc, docs)

     q <- SolrQuery(id %in% as.character(2:4))
     read(sc, q)

     solr$kill()


SolrExpression

Description

There is a formal framework for constructing and manipulating the Solr languages that is not yet exposed. Please inform the authors if exposing the framework would be helpful. Perhaps it would be helpful in support of implementing new functionality on top of SolrPromise.

Author(s)

Michael Lawrence


SolrFrame

Description

The SolrFrame object makes Solr data accessible through a data.frame-like interface. This is the typical way an R user accesses data from a Solr core. Much of its methods are shared with SolrList, which has very similar behavior.

Details

A SolrFrame should more or less behave analogously to a data frame. It provides the same basic accessors (nrow, ncol, length, rownames, colnames, [, [<-, [[, [[<-, $, $<-, head, tail, etc) and can be coerced to an actual data frame via as.data.frame. Supported types of data manipulations include subset, transform, sort, xtabs, aggregate, unique, summary, etc.

Mapping a collection of documents to a tablular data structure is not quite natural, as the document collection is ragged: a given document can have any arbitrary set of fields, out of a set that is essentially infinite. Unlike some other document stores, however, Solr constrains the type of every field through a schema. The schema achieves flexibility through “dynamic” fields. The name of a dynamic field is a wildcard pattern, and any document field that matches the pattern is expected to obey the declared type and other constraints.

When determining its set of columns, SolrFrame takes every actual field present in the collection, and (by default) adds all non-dynamic (static) fields, in the order specified by the schema. Note that is very likely that many columns will consist entirely or almost entirely of NAs.

If a collection is extremly ragged, where few fields are shared between documents, it may make more sense to treat the data as a list, through SolrList, which shares almost all of the functionality of SolrFrame but in a different shape.

The rownames are taken from the field declared in the schema to represent the unique document key. Schemas are not strictly required to declare such a field, so if there is no unique key, the rownames are NULL.

Field restrictions passed to e.g. [ or subset(fields=) may be specified by name, or wildcard pattern (glob). Similarly, a row index passed to [ must be either a character vector of identifiers (of length <= 1024, NAs are not supported, and this requires a unique key in the schema) or a SolrPromise/SolrExpression, but note that if it evaluates to NAs, the corresponding rows are excluded from the result, as with subset. Using a SolrPromise or SolrExpression is recommended, as filtering happens at the database.

A special feature of SolrFrame, vs. an ordinary data frame, is that it can be grouped into a GroupedSolrFrame, where every column is modeled as a list, split by some combination of grouping factors. This is useful for aggregation and supports the implementation of the aggregate method, which is the recommended high-level interface.

Another interesting feature is laziness. One can defer a SolrFrame, so that all column retrieval, e.g., via $ or eval, returns a SolrPromise object. Many operations on promises are deferred, until they are finally fulfilled by being shown or through explicit coercion to an R vector.

A note for developers: SolrList and SolrFrame share common functionality through the base Solr class. Much of the functionality mentioned here is actually implemented as methods on the Solr class.

Accessors

These are some accessors that SolrFrame adds on top of the basic data frame accessors. Most of these are for advanced use only.

Extended API

Most of the typical data frame accessors and data manipulation functions will work analogously on SolrFrame (see Details). Below, we list some of the non-standard methods that might be seen as an extension of the data frame API.

Constructor

Evaluation

Coercion

Author(s)

Michael Lawrence

See Also

SolrList for representing a Solr collection as a list instead of a table

Examples


     schema <- deriveSolrSchema(mtcars)
     solr <- TestSolr(schema)
     sr <- SolrFrame(solr$uri)
     sr[] <- mtcars
     dim(sr)
     head(sr)
     subset(sr, mpg > 20 & cyl == 4)
     solr$kill()
     ## see the vignette for more


SolrList

Description

The SolrList object makes Solr data accessible through a list-like interface. This interface is appropriate when the data are highly ragged.

Details

A SolrList should more or less behave analogously to a list. It provides the same basic accessors (length, names, [, [<-, [[, [[<-, $, $<-, head, tail, etc) and can be coerced to a list via as.list. Supported types of data manipulations include subset, transform, sort, xtabs, aggregate, unique, summary, etc.

An obvious difference between a SolrList and an ordinary list is that we know the SolrList contains only documents, which are themselves represented as named lists of fields, usually vectors of length one. This constraint enables us to provide the convenience of accessing fields by slicing across every document. We can pass a field selection to the second argument of [. Like data frame, selecting a single column with e.g. x[,"foo"] will return the field as a vector, filling NAs whereever a document lacks a value for the field.

The names are taken from the field declared in the schema to represent the unique document key. Schemas are not strictly required to declare such a field, so if there is no unique key, the names are NULL.

Field restrictions passed to e.g. [ or subset(fields=) may be specified by name, or wildcard pattern (glob). Similarly, a row index passed to [ must be either a character vector of identifiers (of length <= 1024, NAs are not supported, and this requires a unique key in the schema) or a SolrPromise/SolrExpression, but note that if it evaluates to NAs, the corresponding rows are excluded from the result, as with subset. Using a SolrPromise or SolrExpression is recommended, as filtering happens at the database.

A SolrList can be made lazy by calling defer on a SolrList, so that all column retrieval, e.g., via [, returns a SolrPromise object. Many operations on promises are deferred, until they are finally fulfilled by being shown or through explicit coercion to an R vector.

A note for developers: SolrFrame and SolrList share common functionality through the base Solr class. Much of the functionality mentioned here is actually implemented as methods on the Solr class.

Accessors

These are some accessors that SolrList adds on top of the basic data frame accessors. Most of these are for advanced use only.

Extended API

Most of the typical data frame accessors and data manipulation functions will work analogously on SolrList (see Details). Below, we list some of the non-standard methods that might be seen as an extension of the data frame API.

Constructor

Evaluation

Coercion

Author(s)

Michael Lawrence

See Also

SolrFrame for representing a Solr collection as a table instead of a list

Examples


     solr <- TestSolr()
     sr <- SolrList(solr$uri)
     length(sr)
     head(sr)
     sr[["GB18030TEST"]]
     # Solr tends to crash for some reason running this inside R CMD check
     ## Not run:  
     as.list(subset(sr, price > 100))[,"price"]
     
## End(Not run)
     solr$kill()


SolrPromise

Description

SolrPromise is a vector-like representation of a deferred computation within Solr. It may promise to simply return a field, to perform arithmetic on a combination of fields, to aggregate a field, etc. Methods on SolrPromise allow the R user to manipulate Solr data with the ordinary R API. The typical way to fulfill a promise is to explicitly coerce the promise to a materialized data type, such as an R vector.

Details

In general, SolrPromise acts just like an R vector. It supports all of the basic vector manipulations, including the Logic, Compare, Arith, Math, and Summary group generics, as well as length, lengths, %in%, complete.cases, is.na, [, grepl, grep, round, signif, ifelse, pmax, pmin, cut, mean, quantile, median, weighted.mean, IQR, mad, anyNA. All of these functions are lazy, in that they return another promise.

The promise is really only known to rsolr, as all actual Solr queries are eager. SolrPromise does its best to defer computations, but the computations will be forced if one performs an operation that is not supported by Solr.

These functions are also supported, but they are eager: cbind, rbind, summary, window, head, tail, unique, intersect, setdiff, union, table and ftable. These functions from the Math group generic are eager: cummax, cummin, cumprod, cumsum, log2, and *gamma.

The [<- function will be lazy as long as both x and i are promises. i is assumed to represent a logical subscript. Otherwise, [<- is eager.

SolrPromise also extends the R API with some new operations: nunique (number of unique elements), rescale (rescale to within a min/max), ndoc, windows, heads, tails.

Limitations

This section outlines some limitations of SolrPromise methods, compared to the base vector implementation. The primary limitation is that binary operations generally only work between two promises that derive from the same data source, including all pending manipulations (filters, ordering, etc). Operations between a promise and an ordinary vector usually only work if the vector is of length one (a scalar).

Some specific notes:

Author(s)

Michael Lawrence

See Also

SolrFrame, which yields promises when it is deferred.


SolrQuery

Description

The SolrQuery object represents a query to be sent to a SolrCore. This is a low-level interface to query construction but will not be useful to most users. The typical reason to directly manipulate a query would be to batch more operations than is possible with the high-level SolrFrame, e.g., combining multiple aggregations.

Details

A SolrQuery API borrows many of the same verbs from the base R API, including subset, transform, sort, xtabs, head, tail, rev, etc.

The typical workflow is to construct a query, perform various manipulations, and finally retrieve a result by passing the query to a SolrCore, typically via the docs or facets functions.

Accessors

Querying

Constructor

Faceting

The Solr facet component counts documents and calculates statistics on a group-wise basis.

Grouping

The Solr grouping component causes results to be returned nested into groups. The main use case would be to restrict to the first or last N documents in each group. This functionality is not related to aggregation; see facet.

Coercion

These two functions are very low-level; users should almost never need to call these.

Author(s)

Michael Lawrence

See Also

SolrFrame, the recommended high-level interface for interacting with Solr

SolrCore, which gives an example of constructing and evaluating a query


SolrSchema

Description

The SolrSchema object represents the schema of a Solr core. Not all of the information in the schema is represented; only the relevant elements are included. The user should not need to interact with this class very often.

One can infer a SolrSchema from a data.frame with deriveSolrSchema and then write it out to a file for use with Solr.

Accessors

Generation and Export

It may be convenient for R users to autogenerate a Solr schema from a prototypical data frame. Note that to harness the full power of Solr, it pays to get familiar with the details. After deriving a schema with deriveSolrSchema, save it to the standard XML format with saveXML. See the vignette for an example.

Author(s)

Michael Lawrence


Testing Solr

Description

Launches an instance of the embedded Solr and creates a core for testing and demonstration purposes.

Usage

TestSolr(schema = NULL, start = TRUE, restart = FALSE)

Arguments

schema

The SolrSchema object describing the schema for the new Solr core

start

Whether to actually start the server (it can be started later by interacting with the returned object). If there is already a server running, the return value points to that instance.

restart

Force the Solr server to restart.

Value

An instance of ExampleSolr, a reference class. Typically, one just accesses the uri field, and passes it to a constructor of SolrFrame or SolrCore.

Author(s)

Michael Lawrence

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.